SlideShare a Scribd company logo
1 of 22
Download to read offline
CROSS-YEAR MULTI-MODAL IMAGE RETRIEVAL USING
SIAMESE NETWORKS
M. Khokhlova1,2
, V. Gouet-Brunet1
, N.Abadie1
, L. Chen2
, ICIP 2020.
IGN1
and LIRIS2
WiMLDS Paris meetup
Presentor: Margarita Khokhlova,somewhere in the world, September 10, 2020.
1
The research problematic
Alegoria project aims at facilitating the promotion of iconographic institutional funds
collections describing the French territory in various periods going from the interwar period
to our days.
Some examples from the vast archives of the project resources
Archive resources:
Cartothèque de la Reconstruction, Fonds Archives nationales (Fonds LAPIE (1955-65), Cartothèque de la
Reconstruction (1948-1976), aerial views:
https://www.siv.archives-nationales.culture.gouv.fr/siv/IR/FRAN_IR_050605
Fonds Musée Nicéphore Niépce (Fonds de l’entreprise CIM (1949-1974), Fonds Bouquet (1914-1918), etc),
cartes postales, aerial views.
2
Aerial
and
ground-view
images,
different angle.
Digitized archive data:
Aerial and ground-view images with and without corresponding metadata.
Multi-modal cross-temporal data covering all France:
BD TOPO and ORTHO from IGN (multiple versions starting from 2004)[1].
▪ Aerial photographs (100% vertical)
▪ Manually annotated industrial and natural objects in the vector form
Many are available as an Open Source.
Project benchmarks and data
1. IGN geo datal: https://geoservices.ign.fr/ 3
FR-0419
Multi-modal cross-temporal vertical images database:
Research question: can we retrieve the same geozone across time?
Annotations: semantic objects from the image, a department
Ground truth sources: matching geo zones from the aerial images and semantic
maps from IGN (database TOPO/ORTHO) to create the multi-modal dataset
FR-0419.
4
Research scope of this work
Goal:
To perform the multi-modal research of aerial images representing the same
geographic location across time:
▪ How do modern cross-view and standard image descriptors handle this task
▪ How can we use the semantic data to improve the search results
▪ Which modality is more important for the across-time search
2004-2019 changes
5
Multi-modal cross time database
3 selected departments in the East of France:
# department images
high-res BD
ORTHO
patches 2000x2000 pixels (1
square km)
1 Moselle 327 6000
2 Bas-Rhin 248 4430
3 Meurthe-et-Moselle 291 5855
6Coverage of Mozelle by BD
ORTHO (image)
Visual and Semantic data
2004
image
2004
semantic
label
2019
image
2019
semantic
label
7
Semantic categories selected
Table: most important selected semantic categories and time changes
8
Baseline: Evaluation Pipeline
MAP@N =
9
Baselines
Classical
Model: ResNet50 [2]
Backbone: ResNet50
Pooling: MaxPooling
Image resolution: 512x512
Pre-trained on: Imagenet
Final descriptor size (single dims): 2048
Cross-view image retrieval
Model: GEM [3]
Backbone: Resnet101
Pooling: GEM layer
Image resolution: 1024x1024
Pre-trained on: oxford5k, paris6k,
roxford5k, rparis6k
Final descriptor size (single dims): 2048
2.He, Kaiming, et al. "Identity mappings in deep residual networks." European conference on computer vision. Springer,
Cham, 2016.
3.Radenović, Filip, Giorgos Tolias, and Ondřej Chum. "Fine-tuning CNN image retrieval with no human annotation." IEEE
transactions on pattern analysis and machine intelligence 41.7 (2018): 1655-1668.
10
Multi-modal information usage
Data fusion
▪ Images
▪ Semantic labels
▪ Early fusion (concatenation)
▪ Late fusion
▪ Fusion by a convolutional layer
D2048
D2048
D4096
KNN
D2048
D2048
KNN KNN
similarity-based
ranking
D2048
KNN
11
Baseline results
Trends:
▪ Descriptors on semantic masks give better results than descriptors on natural images.
▪ Multi-modal data give better accuracy than the single modalities.
▪ Late fusion gives the best results for both baselines.
▪ Pre-trained off-the-shelf Resnet50 outperforms the more sophisticated GEM on our data
(low-res).
Table: map@5 off-the-shelf descriptor GEM
Table: map@5 off-the-shelf descriptor Resnet50
12
Method to improve the baseline and fine-tune on our data
Similar research problematics:
Single-shot learning - the case when very few training (or just a single one) are available to
train the model:
▪ Face recognition
▪ Object recognition [4,5]
▪ Omniglot symbols recognition dataset [6]
Common solutions:
Siamese and Triplet Networks:
▪ Cross Entropy Loss
▪ Contrastive Loss
▪ Triplet Loss
13
4. Vinyals, Oriol, et al. “Matching networks for one shot learning.” Advances in Neural Information Processing
Systems. 2016.
5.Qiao, Siyuan, et al. “Few-shot image recognition by predicting parameters from activations.” CoRR,
abs/1706.03466 1 (2017).
6.Lake, Brenden, et al. “One shot learning of simple visual concepts.” Proceedings of the Annual Meeting of the
Cognitive Science Society. Vol. 33. No. 33. 2011.
Siamese architecture for multi-modal data
Definitions:
X1
,S1
;X2
,S2
- input aerial image and a corresponding semantic map:
matching pairs and non-matching pairs.
Y - ground truth correspondence level
DR
- resulting descriptor
14
Loss function & training
A simple binary cross-entropy loss:
(3)
Training data:
Training set: Moselle
Validation: Bas-Rhin
Test: Meurthe-et-Moselle
Cross-Validation: (swapping the departments)
Hard-mining: each 5 epochs based on the wrong matches from the KNN algorithm.
15
Experiments
Hyperparameters:
Descriptor size: 128, 256, 512
Training: 100 epochs
Optimizer: Adam, lr = 8e-04 and a decay
BS: 12 (6 random and 6 hard)
Distance: L1 and L2
KNN distance metric: cosine and euclidean
Comparison with another method:
SimCLR* unsupervised descriptor learning [7]: NT-Xent loss and contrastive learning.
7. Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." arXiv
preprint arXiv:2002.05709 (2020).
16
Results
department best baseline partition map@5 cross-validation map@5
Moselle 0.76 training 0.96 validation 0.96
Bas-Rhin 0.75 validation 0.93 testing 0.91
M-et-Moselle 0.84 testing 0.97 training 0.97
Table: map@5
17
Fine-tuned the siamese network to get ~20% improvement in comparison to the best
baseline:
Descriptor
Conv
ResNet50
3conv
FC
Comparison with the NT-Xent Loss
18
method R map@5
training: Moselle validation:
Bas-Rhin
testing:
M et Moselle
Contrastive
NT-Xent loss
128 0.77 0.62 0.52
Contrastive
NT-Xent loss
2048 0.83 0.73 0.61
Our 128 0.96 0.93 0.97
Our 2048 0.61 0.71 0.58
Parameters:
Total epochs: 100
BS: 26 pairs
Image resolution: 256
NT-XENT τ (temperature): 100
Augmentation: Color jitter and random rotations
Ablation & error analysis
Final descriptor size:
Erroneous matches returned by the KNN algorithm. Most errors concern forested zones
R map@5 training:
Moselle
map@5 validation
Bas-Rhin
map@5 testing:
M et Moselle
128 0.96 0.93 0.97
256 0.96 0.92 0.97
512* 0.86 0.70 0.80
Table: map@5 with different size of a descriptor, in the brackets the latest results, *not stable training
19
Ablation & parameters
Color:
Grayscale semantic data vs RGB
Normalization:
Batchnormalisation in all the added layers vs none
Activation:
Tanh activation in all the added layers
Losses:
Focal loss vs BCE loss.
Image size:
256 & 512, however, the second one allowed only very small batches and leaded to
an unstable training. 20
Conclusion
▪ A novel approach for learning from multi-modal data using fusion to
fine-tune any CNN-based image descriptor so any backbone can be
used.
▪ The resulting descriptor is powerful enough to distinguish between
images that are semantically close and is robust against evolutionary
landscape changes through time:
just 128 values in a single descriptor
map@5 averaged for test and validation sets is 0.94
▪ New multi-modal dataset extracted from the BD TOPO/ORTHO IGN
- a unique and rich source of information for geo exploration.
21
Questions & Answers
Thank you for your attention!
To find out more, please check out the publication:
1.Margarita Khokhlova, Valerie Gouet-Brunet, Nathalie Abadie and Liming Chen. Recherche
multimodale d’images aériennes multi-date à l’aide d’un réseau siamois, RFIAP 2020.
2. https://github.com/margokhokhlova/siamese_net
Or contact me:
margarita.khokhlova@ign.fr
22

More Related Content

What's hot

Satellite Image Classification with Deep Learning Survey
Satellite Image Classification with Deep Learning SurveySatellite Image Classification with Deep Learning Survey
Satellite Image Classification with Deep Learning Surveyijtsrd
 
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...Edge AI and Vision Alliance
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL
 
Optimal deep learning model For Classification of Lung Cancer on CT Images
Optimal deep learning model For Classification of  Lung Cancer on CT ImagesOptimal deep learning model For Classification of  Lung Cancer on CT Images
Optimal deep learning model For Classification of Lung Cancer on CT ImagesDr.Sachi Nandan Mohanty
 
Extended Visual Cryptography Using Watermarking
Extended Visual Cryptography Using WatermarkingExtended Visual Cryptography Using Watermarking
Extended Visual Cryptography Using WatermarkingShivam Singh
 
Using Mask R CNN to Isolate PV Panels from Background Object in Images
Using Mask R CNN to Isolate PV Panels from Background Object in ImagesUsing Mask R CNN to Isolate PV Panels from Background Object in Images
Using Mask R CNN to Isolate PV Panels from Background Object in Imagesijtsrd
 
Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Micro...
Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Micro...Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Micro...
Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Micro...Fellowship at Vodafone FutureLab
 
A CHAOTIC CONFUSION-DIFFUSION IMAGE ENCRYPTION BASED ON HENON MAP
A CHAOTIC CONFUSION-DIFFUSION IMAGE ENCRYPTION BASED ON HENON MAPA CHAOTIC CONFUSION-DIFFUSION IMAGE ENCRYPTION BASED ON HENON MAP
A CHAOTIC CONFUSION-DIFFUSION IMAGE ENCRYPTION BASED ON HENON MAPIJNSA Journal
 
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
IRJET -  	  An Robust and Dynamic Fire Detection Method using Convolutional N...IRJET -  	  An Robust and Dynamic Fire Detection Method using Convolutional N...
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...IRJET Journal
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Computer Vision with Deep Learning
Computer Vision with Deep LearningComputer Vision with Deep Learning
Computer Vision with Deep LearningCapgemini
 
Double layer security using visual cryptography and transform based steganogr...
Double layer security using visual cryptography and transform based steganogr...Double layer security using visual cryptography and transform based steganogr...
Double layer security using visual cryptography and transform based steganogr...eSAT Publishing House
 
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...ijma
 
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITION
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITIONTRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITION
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITIONijaia
 
Image based security
Image based securityImage based security
Image based securityAdri Jovin
 

What's hot (18)

Satellite Image Classification with Deep Learning Survey
Satellite Image Classification with Deep Learning SurveySatellite Image Classification with Deep Learning Survey
Satellite Image Classification with Deep Learning Survey
 
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
 
AlexNet
AlexNetAlexNet
AlexNet
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
 
Optimal deep learning model For Classification of Lung Cancer on CT Images
Optimal deep learning model For Classification of  Lung Cancer on CT ImagesOptimal deep learning model For Classification of  Lung Cancer on CT Images
Optimal deep learning model For Classification of Lung Cancer on CT Images
 
Extended Visual Cryptography Using Watermarking
Extended Visual Cryptography Using WatermarkingExtended Visual Cryptography Using Watermarking
Extended Visual Cryptography Using Watermarking
 
Using Mask R CNN to Isolate PV Panels from Background Object in Images
Using Mask R CNN to Isolate PV Panels from Background Object in ImagesUsing Mask R CNN to Isolate PV Panels from Background Object in Images
Using Mask R CNN to Isolate PV Panels from Background Object in Images
 
Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Micro...
Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Micro...Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Micro...
Deep Learning based Segmentation Pipeline for Label-Free Phase-Contrast Micro...
 
A CHAOTIC CONFUSION-DIFFUSION IMAGE ENCRYPTION BASED ON HENON MAP
A CHAOTIC CONFUSION-DIFFUSION IMAGE ENCRYPTION BASED ON HENON MAPA CHAOTIC CONFUSION-DIFFUSION IMAGE ENCRYPTION BASED ON HENON MAP
A CHAOTIC CONFUSION-DIFFUSION IMAGE ENCRYPTION BASED ON HENON MAP
 
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
IRJET -  	  An Robust and Dynamic Fire Detection Method using Convolutional N...IRJET -  	  An Robust and Dynamic Fire Detection Method using Convolutional N...
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Computer Vision with Deep Learning
Computer Vision with Deep LearningComputer Vision with Deep Learning
Computer Vision with Deep Learning
 
Double layer security using visual cryptography and transform based steganogr...
Double layer security using visual cryptography and transform based steganogr...Double layer security using visual cryptography and transform based steganogr...
Double layer security using visual cryptography and transform based steganogr...
 
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
A NOVEL IMAGE STEGANOGRAPHY APPROACH USING MULTI-LAYERS DCT FEATURES BASED ON...
 
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITION
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITIONTRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITION
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITION
 
Image based security
Image based securityImage based security
Image based security
 
E1083237
E1083237E1083237
E1083237
 

Similar to Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Khokhlova, Research Scientist (Post-Doc) at LIRIS

Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...inside-BigData.com
 
Road Segmentation from satellites images
Road Segmentation from satellites imagesRoad Segmentation from satellites images
Road Segmentation from satellites imagesYoussefKitane
 
IT6005 digital image processing question bank
IT6005   digital image processing question bankIT6005   digital image processing question bank
IT6005 digital image processing question bankGayathri Krishnamoorthy
 
IGARSS2011_vehicles_M_SHIMONI.ppt
IGARSS2011_vehicles_M_SHIMONI.pptIGARSS2011_vehicles_M_SHIMONI.ppt
IGARSS2011_vehicles_M_SHIMONI.pptgrssieee
 
Image formation
Image formationImage formation
Image formationpotaters
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Universitat Politècnica de Catalunya
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...Mokhtar SELLAMI
 
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...Mohamed Elawady
 
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...paperpublications3
 
F0255046056
F0255046056F0255046056
F0255046056theijes
 
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...Ravi Kiran B.
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
De-convolution on Digital Images
De-convolution on Digital ImagesDe-convolution on Digital Images
De-convolution on Digital ImagesMd. Shohel Rana
 
A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014SondosFadl
 
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATURE
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATUREMINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATURE
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATUREijcsit
 
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATURE
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATUREMINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATURE
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATUREAIRCC Publishing Corporation
 
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Savvas Chatzichristofis
 
Coastal erosion management using image processing and Node Oriented Programming
Coastal erosion management using image processing and Node Oriented Programming Coastal erosion management using image processing and Node Oriented Programming
Coastal erosion management using image processing and Node Oriented Programming AbdAllah Aly
 

Similar to Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Khokhlova, Research Scientist (Post-Doc) at LIRIS (20)

Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
 
Road Segmentation from satellites images
Road Segmentation from satellites imagesRoad Segmentation from satellites images
Road Segmentation from satellites images
 
IT6005 digital image processing question bank
IT6005   digital image processing question bankIT6005   digital image processing question bank
IT6005 digital image processing question bank
 
IGARSS2011_vehicles_M_SHIMONI.ppt
IGARSS2011_vehicles_M_SHIMONI.pptIGARSS2011_vehicles_M_SHIMONI.ppt
IGARSS2011_vehicles_M_SHIMONI.ppt
 
Image formation
Image formationImage formation
Image formation
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
 
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
 
EUSIPCO_2018_Slides.pdf
EUSIPCO_2018_Slides.pdfEUSIPCO_2018_Slides.pdf
EUSIPCO_2018_Slides.pdf
 
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
 
F0255046056
F0255046056F0255046056
F0255046056
 
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
De-convolution on Digital Images
De-convolution on Digital ImagesDe-convolution on Digital Images
De-convolution on Digital Images
 
A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014
 
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATURE
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATUREMINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATURE
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATURE
 
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATURE
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATUREMINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATURE
MINIMIZING DISTORTION IN STEGANOG-RAPHY BASED ON IMAGE FEATURE
 
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
 
Coastal erosion management using image processing and Node Oriented Programming
Coastal erosion management using image processing and Node Oriented Programming Coastal erosion management using image processing and Node Oriented Programming
Coastal erosion management using image processing and Node Oriented Programming
 
Tele immersion
Tele immersionTele immersion
Tele immersion
 

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 

Recently uploaded

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 

Recently uploaded (20)

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Khokhlova, Research Scientist (Post-Doc) at LIRIS

  • 1. CROSS-YEAR MULTI-MODAL IMAGE RETRIEVAL USING SIAMESE NETWORKS M. Khokhlova1,2 , V. Gouet-Brunet1 , N.Abadie1 , L. Chen2 , ICIP 2020. IGN1 and LIRIS2 WiMLDS Paris meetup Presentor: Margarita Khokhlova,somewhere in the world, September 10, 2020. 1
  • 2. The research problematic Alegoria project aims at facilitating the promotion of iconographic institutional funds collections describing the French territory in various periods going from the interwar period to our days. Some examples from the vast archives of the project resources Archive resources: Cartothèque de la Reconstruction, Fonds Archives nationales (Fonds LAPIE (1955-65), Cartothèque de la Reconstruction (1948-1976), aerial views: https://www.siv.archives-nationales.culture.gouv.fr/siv/IR/FRAN_IR_050605 Fonds Musée Nicéphore Niépce (Fonds de l’entreprise CIM (1949-1974), Fonds Bouquet (1914-1918), etc), cartes postales, aerial views. 2 Aerial and ground-view images, different angle.
  • 3. Digitized archive data: Aerial and ground-view images with and without corresponding metadata. Multi-modal cross-temporal data covering all France: BD TOPO and ORTHO from IGN (multiple versions starting from 2004)[1]. ▪ Aerial photographs (100% vertical) ▪ Manually annotated industrial and natural objects in the vector form Many are available as an Open Source. Project benchmarks and data 1. IGN geo datal: https://geoservices.ign.fr/ 3
  • 4. FR-0419 Multi-modal cross-temporal vertical images database: Research question: can we retrieve the same geozone across time? Annotations: semantic objects from the image, a department Ground truth sources: matching geo zones from the aerial images and semantic maps from IGN (database TOPO/ORTHO) to create the multi-modal dataset FR-0419. 4
  • 5. Research scope of this work Goal: To perform the multi-modal research of aerial images representing the same geographic location across time: ▪ How do modern cross-view and standard image descriptors handle this task ▪ How can we use the semantic data to improve the search results ▪ Which modality is more important for the across-time search 2004-2019 changes 5
  • 6. Multi-modal cross time database 3 selected departments in the East of France: # department images high-res BD ORTHO patches 2000x2000 pixels (1 square km) 1 Moselle 327 6000 2 Bas-Rhin 248 4430 3 Meurthe-et-Moselle 291 5855 6Coverage of Mozelle by BD ORTHO (image)
  • 7. Visual and Semantic data 2004 image 2004 semantic label 2019 image 2019 semantic label 7
  • 8. Semantic categories selected Table: most important selected semantic categories and time changes 8
  • 10. Baselines Classical Model: ResNet50 [2] Backbone: ResNet50 Pooling: MaxPooling Image resolution: 512x512 Pre-trained on: Imagenet Final descriptor size (single dims): 2048 Cross-view image retrieval Model: GEM [3] Backbone: Resnet101 Pooling: GEM layer Image resolution: 1024x1024 Pre-trained on: oxford5k, paris6k, roxford5k, rparis6k Final descriptor size (single dims): 2048 2.He, Kaiming, et al. "Identity mappings in deep residual networks." European conference on computer vision. Springer, Cham, 2016. 3.Radenović, Filip, Giorgos Tolias, and Ondřej Chum. "Fine-tuning CNN image retrieval with no human annotation." IEEE transactions on pattern analysis and machine intelligence 41.7 (2018): 1655-1668. 10
  • 11. Multi-modal information usage Data fusion ▪ Images ▪ Semantic labels ▪ Early fusion (concatenation) ▪ Late fusion ▪ Fusion by a convolutional layer D2048 D2048 D4096 KNN D2048 D2048 KNN KNN similarity-based ranking D2048 KNN 11
  • 12. Baseline results Trends: ▪ Descriptors on semantic masks give better results than descriptors on natural images. ▪ Multi-modal data give better accuracy than the single modalities. ▪ Late fusion gives the best results for both baselines. ▪ Pre-trained off-the-shelf Resnet50 outperforms the more sophisticated GEM on our data (low-res). Table: map@5 off-the-shelf descriptor GEM Table: map@5 off-the-shelf descriptor Resnet50 12
  • 13. Method to improve the baseline and fine-tune on our data Similar research problematics: Single-shot learning - the case when very few training (or just a single one) are available to train the model: ▪ Face recognition ▪ Object recognition [4,5] ▪ Omniglot symbols recognition dataset [6] Common solutions: Siamese and Triplet Networks: ▪ Cross Entropy Loss ▪ Contrastive Loss ▪ Triplet Loss 13 4. Vinyals, Oriol, et al. “Matching networks for one shot learning.” Advances in Neural Information Processing Systems. 2016. 5.Qiao, Siyuan, et al. “Few-shot image recognition by predicting parameters from activations.” CoRR, abs/1706.03466 1 (2017). 6.Lake, Brenden, et al. “One shot learning of simple visual concepts.” Proceedings of the Annual Meeting of the Cognitive Science Society. Vol. 33. No. 33. 2011.
  • 14. Siamese architecture for multi-modal data Definitions: X1 ,S1 ;X2 ,S2 - input aerial image and a corresponding semantic map: matching pairs and non-matching pairs. Y - ground truth correspondence level DR - resulting descriptor 14
  • 15. Loss function & training A simple binary cross-entropy loss: (3) Training data: Training set: Moselle Validation: Bas-Rhin Test: Meurthe-et-Moselle Cross-Validation: (swapping the departments) Hard-mining: each 5 epochs based on the wrong matches from the KNN algorithm. 15
  • 16. Experiments Hyperparameters: Descriptor size: 128, 256, 512 Training: 100 epochs Optimizer: Adam, lr = 8e-04 and a decay BS: 12 (6 random and 6 hard) Distance: L1 and L2 KNN distance metric: cosine and euclidean Comparison with another method: SimCLR* unsupervised descriptor learning [7]: NT-Xent loss and contrastive learning. 7. Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." arXiv preprint arXiv:2002.05709 (2020). 16
  • 17. Results department best baseline partition map@5 cross-validation map@5 Moselle 0.76 training 0.96 validation 0.96 Bas-Rhin 0.75 validation 0.93 testing 0.91 M-et-Moselle 0.84 testing 0.97 training 0.97 Table: map@5 17 Fine-tuned the siamese network to get ~20% improvement in comparison to the best baseline: Descriptor Conv ResNet50 3conv FC
  • 18. Comparison with the NT-Xent Loss 18 method R map@5 training: Moselle validation: Bas-Rhin testing: M et Moselle Contrastive NT-Xent loss 128 0.77 0.62 0.52 Contrastive NT-Xent loss 2048 0.83 0.73 0.61 Our 128 0.96 0.93 0.97 Our 2048 0.61 0.71 0.58 Parameters: Total epochs: 100 BS: 26 pairs Image resolution: 256 NT-XENT τ (temperature): 100 Augmentation: Color jitter and random rotations
  • 19. Ablation & error analysis Final descriptor size: Erroneous matches returned by the KNN algorithm. Most errors concern forested zones R map@5 training: Moselle map@5 validation Bas-Rhin map@5 testing: M et Moselle 128 0.96 0.93 0.97 256 0.96 0.92 0.97 512* 0.86 0.70 0.80 Table: map@5 with different size of a descriptor, in the brackets the latest results, *not stable training 19
  • 20. Ablation & parameters Color: Grayscale semantic data vs RGB Normalization: Batchnormalisation in all the added layers vs none Activation: Tanh activation in all the added layers Losses: Focal loss vs BCE loss. Image size: 256 & 512, however, the second one allowed only very small batches and leaded to an unstable training. 20
  • 21. Conclusion ▪ A novel approach for learning from multi-modal data using fusion to fine-tune any CNN-based image descriptor so any backbone can be used. ▪ The resulting descriptor is powerful enough to distinguish between images that are semantically close and is robust against evolutionary landscape changes through time: just 128 values in a single descriptor map@5 averaged for test and validation sets is 0.94 ▪ New multi-modal dataset extracted from the BD TOPO/ORTHO IGN - a unique and rich source of information for geo exploration. 21
  • 22. Questions & Answers Thank you for your attention! To find out more, please check out the publication: 1.Margarita Khokhlova, Valerie Gouet-Brunet, Nathalie Abadie and Liming Chen. Recherche multimodale d’images aériennes multi-date à l’aide d’un réseau siamois, RFIAP 2020. 2. https://github.com/margokhokhlova/siamese_net Or contact me: margarita.khokhlova@ign.fr 22