SlideShare a Scribd company logo
1 of 26
CARIGANs
UNPAIRED PHOTO-TO-CARICATURE TRANSLATION
AGENDA
2
❖ GANs
❖ CariGANs: Unpaired Photo-to-Caricature Translation
❖ How are they doing it?
❖ CariGANs’ Performance
❖ Extensions / Scope
GANS (GENERATIVE ADVERSARIAL NETWORKS)
3
Generative adversarial networks (GANs) are deep neural net architectures comprised of two nets, pitting one against the other (thus the
“adversarial”).
The difference between generative networks and discriminative networks is that the later maps features to labels. They are concerned
solely with that correlation. One way to think about generative algorithms is that they do the opposite. Instead of predicting a label given
certain features, they attempt to predict features given a certain label.
How Gans Work ?
One neural network, called the generator, generates new data instances, while the other, the discriminator, evaluates them for authenticity;
i.e. the discriminator decides whether each instance of data it reviews belongs to the actual training dataset or not.
The discriminator network is a standard convolutional network that can categorize the images fed to it, a binomial classifier labeling images
as real or fake.
The generator is an inverse convolutional network, in a sense: While a standard convolutional classifier takes an image and downsamples it
to produce a probability, the generator takes a vector of random noise and upsamples it to an image.
GAN
4
CARIGANS: UNPAIRED PHOTO-TO-CARICATURE
TRANSLATION
5
The first Generative Adversarial Network (GAN) for unpaired photo-to-caricature translation, which are called “CariGANs".
CARIGAN
6
1. Given a real picture how to learn a mapping that converts it to a caricature.
1. The exaggerated shape should maintain the relative geometric location of facial components, and only emphasize the subject’s
features, distinct from others.
1. The final appearance should be faithful to visual styles of caricatures, and keep the identity with the input face, as addressed in
other face generators.
1. Moreover, the generation must be diverse and controllable. Given one input face photo, it allows for the generation of variant types
of caricatures, and even controls the results either by example caricature, or by user interactions.
1. We want to learn a mapping Φ : X → Y that can transfer an input x ∈ X to a sample y = Φ(x), y ∈ Y .
WHAT ARE THEY TRYING TO ACHIEVE ?
7
1. Earlier efforts were mostly supervised learning. Given a real image and a pairing with the caricature image. The network learns how
to convert it to the caricature image .
1. As is commonly known, most photo and caricature examples are unfortunately unpaired in the world. Building such a dataset with
thousands of image pairs (i.e., a face photo and its associate caricature drawn by artists) would be too expensive and tedious.
PREVIOUS EFFORTS
8
Let X and Y be the face photo domain and the caricature domain respectively, where no pairing exists between the two domains.
1. For the photo domain X , we randomly sample 10, 000 face images from the CelebA database {x i } i=1, ..., N , x i ∈ X which covers
diverse gender, races, ages, expressions, poses and etc.
1. To obtain the caricature domain Y , we collect 8, 451 hand-drawn portrait caricatures from the Internet with different drawing
styles (e.g., cartoon, pencil-drawing) and various exaggerated facial features, {y i } i=1, ..., M , y i ∈ Y .
DATA COLLECTION
9
HOW ARE THEY DOING IT ?
10
The pipeline is split into two GANs:
1. CariGeoGAN : which only models the geometry-to-geometry transformation from face photos to caricatures.
2. CariStyGAN, which transfers the style appearance from caricatures to face photos without any geometry deformation.
3. Two GANs are separately trained for each task, which makes the learning more robust.
4. To build the relation between unpaired image pairs, both CariGeoGAN and CariStyGAN use cycle-consistency network structures,
which are widely used in cross-domain or unsupervised image translation.
5. Finally, the exaggerated shape (obtained from CariGeoGAN ) serves to exaggerate the stylized face (obtained from CariStyGAN )
via image warping. The warping is done by a differentiable spline interpolation module.
6. In CariGeoGAN, we use the PCA representation of facial landmarks instead of landmarks themselves as the input and output of
GAN. This representation implicitly enforces the constraint of face shape prior in the network.
7. In the second stage, we use CariStyGAN to learn the appearance-to-appearance translation from photo to caricature while
preserving its geometry. Here, we need to synthesize an intermediate result y ′ ∈ Y ′ , which is assumed to be as close as caricature
domain Y in appearance and as similar as photo domain X in shape.
HOW ARE THEY DOING IT ?
11
HOW ARE THEY DOING IT ?
12
GEOMETRY DEFORMATION
13
1. Face shape can be represented by 2D face landmarks either for real face photos X ,or for caricatures Y .
1. We manually label 63 facial landmarks for each image in both X and Y .
1. To centralize all facial shapes, all images in both X and Y are aligned to the mean face via three landmarks (center of two eyes and
center of the mouth) using affine transformation.
1. In addition, all images are cropped to the face region, and resized to 256 × 256 resolution for normalizing the scale
1. Note that the dataset of real faces is also used to finetune a landmark detector, which supports automatic landmark detection in
the inference stage.
1. For our CariGeoGAN, to further reduce dimensions of its input and output, we further apply principal component analysis (PCA) on
the landmarks of all samples in X and Y . We take the top 32 principal components to recover 99.03% of total variants. Then the 63
landmarks of each sample are represented by a vector of 32 PCA coefficients. This representation helps constrain the face structure
during mapping learning.
GEOMETRY DEFORMATION(CONTD)
14
1. CariGeoGAN is inspired by the network architecture of CycleGAN.
1. The forward generator learns the mapping from real images to caricature images while the backward generator learns the exact
inverse of the forward generator.
1. As it is a Cycle Gan. Both the backward and forward generators share weights in both the paths.
1. The discriminators learn to distinguish real samples and generated samples.
CARIGEOGAN
15
CARIGEOGAN(SUMMARIZED)
16
CARIGEOGAN (LOSSES)
17
We define three types of loss in the CariGeoGAN.
1. The first is the adversarial loss, which is widely used in GANs. Specifically, they adopted the adversarial loss of LSGAN to
encourage generating landmarks indistinguishable from the hand-drawn caricature landmarks sampled from domain Y . Similarly
another loss is defined for the generating portrait photo landmarks that cannot be distinguished from real photos.
1. The second is the bidirectional cycle-consistency loss, which is also used in CycleGAN to constrain the cycle consistency between
the forward mapping and the reverse mapping. The idea is that when we apply reverse mapping to the synthesized image we
should get back to the original input image.
CARIGEOGAN (LOSSES)
18
3. Cycle-consistency loss further helps constrain the mapping solution from the input to the output. However, it is still weak to
guarantee that the predicted deformation can capture the distinct facial features and then exaggerate them. The third is a new
characteristic loss, which penalizes the cosine differences between input landmark and the predicted one after subtracting its
corresponding means.
● The characteristic loss in the reverse direction is defined similarly.
● The underlying idea is that the differences from a face to the mean face represent its most distinctive features and thus should be
kept after exaggeration. For example, if a face has a larger nose compared to a normal face, this distinctiveness will be preserved or
even exaggerated after converting to caricature.
Our objective function for optimizing CariGeoGAN is:
APPEARANCE STYLIZATION
19
CARIGANS PERFORMANCE
20
PERFORMANCE
21
1. Our core algorithm is developed in PyTorch. All of our experiments are conducted on a PC with an Intel E5 2.6GHz CPU and an
NVIDIA K40 GPU. The total runtime for a 256 × 256 image is approximately 0.14 sec., including 0.10 sec. for appearance stylization,
0.02 sec. for geometric exaggeration and 0.02 sec. for image warping.
1. The perceptual study shows that caricatures generated by CariGANs are closer to the hand-drawn ones, and at the same time
better persevere the identity, compared to state-of-the-art methods.
1. Moreover, our CariGANs allow users to control the shape exaggeration degree and change the color/texture style by tuning the
parameters or giving an example caricature.
EXTENSIONS / SCOPE
22
EXTENSIONS / SCOPE
23
Video Caricature: We directly apply our CariGANs to the video frame by frame. Since the CariGANs exaggerate and stylize the face
according to facial features. The results are overall stable in different frames.
EXTENSIONS / SCOPE
24
Caricature-to-photo translation: Since both CariGeoGAN and CariGeoStyGAN are trained to learn the forward and backward mapping
symmetrically, we can reverse the pipeline to convert an input caricature into its corresponding photo.
THANK YOU
25
REFERENCES
26
CariGANs: Unpaired Photo-to-Caricature Translation
KAIDI CAO∗†,Tsinghua University JING LIAO‡,City University of Hong Kong, Microsoft Research LU YUAN,Microsoft AI Perception and Mixed Reality
Can an AI Learn To Draw a Caricature? From Two Minute Papers

More Related Content

What's hot

Copy-Rotate-Move Forgery Detection Based on Spatial Domain
Copy-Rotate-Move Forgery Detection Based on Spatial DomainCopy-Rotate-Move Forgery Detection Based on Spatial Domain
Copy-Rotate-Move Forgery Detection Based on Spatial DomainSondosFadl
 
FAN search for image copy-move forgery-amalta 2014
 FAN search for image copy-move forgery-amalta 2014 FAN search for image copy-move forgery-amalta 2014
FAN search for image copy-move forgery-amalta 2014SondosFadl
 
A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014SondosFadl
 
SYMMETRICAL WEIGHTED SUBSPACE HOLISTIC APPROACH FOR EXPRESSION RECOGNITION
SYMMETRICAL WEIGHTED SUBSPACE HOLISTIC APPROACH FOR EXPRESSION RECOGNITIONSYMMETRICAL WEIGHTED SUBSPACE HOLISTIC APPROACH FOR EXPRESSION RECOGNITION
SYMMETRICAL WEIGHTED SUBSPACE HOLISTIC APPROACH FOR EXPRESSION RECOGNITIONijcsit
 
UIUC ECE547 term project
UIUC ECE547 term projectUIUC ECE547 term project
UIUC ECE547 term projectYicheng Sun
 
AUTOMATED IMAGE MOSAICING SYSTEM WITH ANALYSIS OVER VARIOUS IMAGE NOISE
AUTOMATED IMAGE MOSAICING SYSTEM WITH ANALYSIS OVER VARIOUS IMAGE NOISEAUTOMATED IMAGE MOSAICING SYSTEM WITH ANALYSIS OVER VARIOUS IMAGE NOISE
AUTOMATED IMAGE MOSAICING SYSTEM WITH ANALYSIS OVER VARIOUS IMAGE NOISEijcsa
 
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUESA STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUEScscpconf
 
Andrey Kuznetsov and Vladislav Myasnikov - Using Efficient Linear Local Feat...
Andrey Kuznetsov and  Vladislav Myasnikov - Using Efficient Linear Local Feat...Andrey Kuznetsov and  Vladislav Myasnikov - Using Efficient Linear Local Feat...
Andrey Kuznetsov and Vladislav Myasnikov - Using Efficient Linear Local Feat...AIST
 
Regenerating face images from multi-spectral palm images using multiple fusio...
Regenerating face images from multi-spectral palm images using multiple fusio...Regenerating face images from multi-spectral palm images using multiple fusio...
Regenerating face images from multi-spectral palm images using multiple fusio...TELKOMNIKA JOURNAL
 
Automatic rectification of perspective distortion from a single image using p...
Automatic rectification of perspective distortion from a single image using p...Automatic rectification of perspective distortion from a single image using p...
Automatic rectification of perspective distortion from a single image using p...ijcsa
 
6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indo...
6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indo...6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indo...
6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indo...Youness Lahdili
 
Image Quality Feature Based Detection Algorithm for Forgery in Images
Image Quality Feature Based Detection Algorithm for Forgery in Images  Image Quality Feature Based Detection Algorithm for Forgery in Images
Image Quality Feature Based Detection Algorithm for Forgery in Images ijcga
 
Gait Based Person Recognition Using Partial Least Squares Selection Scheme
Gait Based Person Recognition Using Partial Least Squares Selection Scheme Gait Based Person Recognition Using Partial Least Squares Selection Scheme
Gait Based Person Recognition Using Partial Least Squares Selection Scheme ijcisjournal
 
Palmprint verification using lagrangian decomposition and invariant interest
Palmprint verification using lagrangian decomposition and invariant interestPalmprint verification using lagrangian decomposition and invariant interest
Palmprint verification using lagrangian decomposition and invariant interestDakshina Kisku
 
Image Stitching Algorithm: An Optimization between Correlation-Based and Feat...
Image Stitching Algorithm: An Optimization between Correlation-Based and Feat...Image Stitching Algorithm: An Optimization between Correlation-Based and Feat...
Image Stitching Algorithm: An Optimization between Correlation-Based and Feat...IJCSIS Research Publications
 

What's hot (19)

Copy-Rotate-Move Forgery Detection Based on Spatial Domain
Copy-Rotate-Move Forgery Detection Based on Spatial DomainCopy-Rotate-Move Forgery Detection Based on Spatial Domain
Copy-Rotate-Move Forgery Detection Based on Spatial Domain
 
FAN search for image copy-move forgery-amalta 2014
 FAN search for image copy-move forgery-amalta 2014 FAN search for image copy-move forgery-amalta 2014
FAN search for image copy-move forgery-amalta 2014
 
A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014
 
SYMMETRICAL WEIGHTED SUBSPACE HOLISTIC APPROACH FOR EXPRESSION RECOGNITION
SYMMETRICAL WEIGHTED SUBSPACE HOLISTIC APPROACH FOR EXPRESSION RECOGNITIONSYMMETRICAL WEIGHTED SUBSPACE HOLISTIC APPROACH FOR EXPRESSION RECOGNITION
SYMMETRICAL WEIGHTED SUBSPACE HOLISTIC APPROACH FOR EXPRESSION RECOGNITION
 
495Poster
495Poster495Poster
495Poster
 
H0334749
H0334749H0334749
H0334749
 
1288
12881288
1288
 
D018112429
D018112429D018112429
D018112429
 
UIUC ECE547 term project
UIUC ECE547 term projectUIUC ECE547 term project
UIUC ECE547 term project
 
AUTOMATED IMAGE MOSAICING SYSTEM WITH ANALYSIS OVER VARIOUS IMAGE NOISE
AUTOMATED IMAGE MOSAICING SYSTEM WITH ANALYSIS OVER VARIOUS IMAGE NOISEAUTOMATED IMAGE MOSAICING SYSTEM WITH ANALYSIS OVER VARIOUS IMAGE NOISE
AUTOMATED IMAGE MOSAICING SYSTEM WITH ANALYSIS OVER VARIOUS IMAGE NOISE
 
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUESA STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
 
Andrey Kuznetsov and Vladislav Myasnikov - Using Efficient Linear Local Feat...
Andrey Kuznetsov and  Vladislav Myasnikov - Using Efficient Linear Local Feat...Andrey Kuznetsov and  Vladislav Myasnikov - Using Efficient Linear Local Feat...
Andrey Kuznetsov and Vladislav Myasnikov - Using Efficient Linear Local Feat...
 
Regenerating face images from multi-spectral palm images using multiple fusio...
Regenerating face images from multi-spectral palm images using multiple fusio...Regenerating face images from multi-spectral palm images using multiple fusio...
Regenerating face images from multi-spectral palm images using multiple fusio...
 
Automatic rectification of perspective distortion from a single image using p...
Automatic rectification of perspective distortion from a single image using p...Automatic rectification of perspective distortion from a single image using p...
Automatic rectification of perspective distortion from a single image using p...
 
6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indo...
6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indo...6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indo...
6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indo...
 
Image Quality Feature Based Detection Algorithm for Forgery in Images
Image Quality Feature Based Detection Algorithm for Forgery in Images  Image Quality Feature Based Detection Algorithm for Forgery in Images
Image Quality Feature Based Detection Algorithm for Forgery in Images
 
Gait Based Person Recognition Using Partial Least Squares Selection Scheme
Gait Based Person Recognition Using Partial Least Squares Selection Scheme Gait Based Person Recognition Using Partial Least Squares Selection Scheme
Gait Based Person Recognition Using Partial Least Squares Selection Scheme
 
Palmprint verification using lagrangian decomposition and invariant interest
Palmprint verification using lagrangian decomposition and invariant interestPalmprint verification using lagrangian decomposition and invariant interest
Palmprint verification using lagrangian decomposition and invariant interest
 
Image Stitching Algorithm: An Optimization between Correlation-Based and Feat...
Image Stitching Algorithm: An Optimization between Correlation-Based and Feat...Image Stitching Algorithm: An Optimization between Correlation-Based and Feat...
Image Stitching Algorithm: An Optimization between Correlation-Based and Feat...
 

Similar to CariGANs : Unpaired Photo-to-Caricature Translation

Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisNaeem Shehzad
 
Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer Tahsin Mayeesha
 
Project_Final_Review.pdf
Project_Final_Review.pdfProject_Final_Review.pdf
Project_Final_Review.pdfDivyaGugulothu
 
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Universitat Politècnica de Catalunya
 
Unpaired Image Translations Using GANs: A Review
Unpaired Image Translations Using GANs: A ReviewUnpaired Image Translations Using GANs: A Review
Unpaired Image Translations Using GANs: A ReviewIRJET Journal
 
IRJET - Face Recognition based Attendance System
IRJET -  	  Face Recognition based Attendance SystemIRJET -  	  Face Recognition based Attendance System
IRJET - Face Recognition based Attendance SystemIRJET Journal
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNZihao(Gerald) Zhang
 
Traffic Automation System
Traffic Automation SystemTraffic Automation System
Traffic Automation SystemPrabal Chauhan
 
Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecognIlyas CHAOUA
 
Face Recognition Using Gabor features And PCA
Face Recognition Using Gabor features And PCAFace Recognition Using Gabor features And PCA
Face Recognition Using Gabor features And PCAIOSR Journals
 
Creating Objects for Metaverse using GANs and Autoencoders
Creating Objects for Metaverse using GANs and AutoencodersCreating Objects for Metaverse using GANs and Autoencoders
Creating Objects for Metaverse using GANs and AutoencodersIRJET Journal
 
Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...SamanthaGallone
 
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...IRJET Journal
 
Performance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAPerformance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAIJECEIAES
 
Image Processing Research paper
Image Processing Research paperImage Processing Research paper
Image Processing Research paperRonak Vyas
 
June8 study neural_style_transfer_olivia_seji_oh
June8 study neural_style_transfer_olivia_seji_ohJune8 study neural_style_transfer_olivia_seji_oh
June8 study neural_style_transfer_olivia_seji_ohSeji OH
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionIOSR Journals
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionIOSR Journals
 

Similar to CariGANs : Unpaired Photo-to-Caricature Translation (20)

Video to Video Translation CGAN
Video to Video Translation CGANVideo to Video Translation CGAN
Video to Video Translation CGAN
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesis
 
Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer
 
Project_Final_Review.pdf
Project_Final_Review.pdfProject_Final_Review.pdf
Project_Final_Review.pdf
 
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
 
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
 
Unpaired Image Translations Using GANs: A Review
Unpaired Image Translations Using GANs: A ReviewUnpaired Image Translations Using GANs: A Review
Unpaired Image Translations Using GANs: A Review
 
IRJET - Face Recognition based Attendance System
IRJET -  	  Face Recognition based Attendance SystemIRJET -  	  Face Recognition based Attendance System
IRJET - Face Recognition based Attendance System
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
 
Traffic Automation System
Traffic Automation SystemTraffic Automation System
Traffic Automation System
 
Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecogn
 
Face Recognition Using Gabor features And PCA
Face Recognition Using Gabor features And PCAFace Recognition Using Gabor features And PCA
Face Recognition Using Gabor features And PCA
 
Creating Objects for Metaverse using GANs and Autoencoders
Creating Objects for Metaverse using GANs and AutoencodersCreating Objects for Metaverse using GANs and Autoencoders
Creating Objects for Metaverse using GANs and Autoencoders
 
Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...
 
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...
An Intelligent approach to Pic to Cartoon Conversion using White-box-cartooni...
 
Performance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAPerformance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGA
 
Image Processing Research paper
Image Processing Research paperImage Processing Research paper
Image Processing Research paper
 
June8 study neural_style_transfer_olivia_seji_oh
June8 study neural_style_transfer_olivia_seji_ohJune8 study neural_style_transfer_olivia_seji_oh
June8 study neural_style_transfer_olivia_seji_oh
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super Resolution
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super Resolution
 

Recently uploaded

GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 

Recently uploaded (20)

GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 

CariGANs : Unpaired Photo-to-Caricature Translation

  • 2. AGENDA 2 ❖ GANs ❖ CariGANs: Unpaired Photo-to-Caricature Translation ❖ How are they doing it? ❖ CariGANs’ Performance ❖ Extensions / Scope
  • 4. Generative adversarial networks (GANs) are deep neural net architectures comprised of two nets, pitting one against the other (thus the “adversarial”). The difference between generative networks and discriminative networks is that the later maps features to labels. They are concerned solely with that correlation. One way to think about generative algorithms is that they do the opposite. Instead of predicting a label given certain features, they attempt to predict features given a certain label. How Gans Work ? One neural network, called the generator, generates new data instances, while the other, the discriminator, evaluates them for authenticity; i.e. the discriminator decides whether each instance of data it reviews belongs to the actual training dataset or not. The discriminator network is a standard convolutional network that can categorize the images fed to it, a binomial classifier labeling images as real or fake. The generator is an inverse convolutional network, in a sense: While a standard convolutional classifier takes an image and downsamples it to produce a probability, the generator takes a vector of random noise and upsamples it to an image. GAN 4
  • 6. The first Generative Adversarial Network (GAN) for unpaired photo-to-caricature translation, which are called “CariGANs". CARIGAN 6
  • 7. 1. Given a real picture how to learn a mapping that converts it to a caricature. 1. The exaggerated shape should maintain the relative geometric location of facial components, and only emphasize the subject’s features, distinct from others. 1. The final appearance should be faithful to visual styles of caricatures, and keep the identity with the input face, as addressed in other face generators. 1. Moreover, the generation must be diverse and controllable. Given one input face photo, it allows for the generation of variant types of caricatures, and even controls the results either by example caricature, or by user interactions. 1. We want to learn a mapping Φ : X → Y that can transfer an input x ∈ X to a sample y = Φ(x), y ∈ Y . WHAT ARE THEY TRYING TO ACHIEVE ? 7
  • 8. 1. Earlier efforts were mostly supervised learning. Given a real image and a pairing with the caricature image. The network learns how to convert it to the caricature image . 1. As is commonly known, most photo and caricature examples are unfortunately unpaired in the world. Building such a dataset with thousands of image pairs (i.e., a face photo and its associate caricature drawn by artists) would be too expensive and tedious. PREVIOUS EFFORTS 8
  • 9. Let X and Y be the face photo domain and the caricature domain respectively, where no pairing exists between the two domains. 1. For the photo domain X , we randomly sample 10, 000 face images from the CelebA database {x i } i=1, ..., N , x i ∈ X which covers diverse gender, races, ages, expressions, poses and etc. 1. To obtain the caricature domain Y , we collect 8, 451 hand-drawn portrait caricatures from the Internet with different drawing styles (e.g., cartoon, pencil-drawing) and various exaggerated facial features, {y i } i=1, ..., M , y i ∈ Y . DATA COLLECTION 9
  • 10. HOW ARE THEY DOING IT ? 10
  • 11. The pipeline is split into two GANs: 1. CariGeoGAN : which only models the geometry-to-geometry transformation from face photos to caricatures. 2. CariStyGAN, which transfers the style appearance from caricatures to face photos without any geometry deformation. 3. Two GANs are separately trained for each task, which makes the learning more robust. 4. To build the relation between unpaired image pairs, both CariGeoGAN and CariStyGAN use cycle-consistency network structures, which are widely used in cross-domain or unsupervised image translation. 5. Finally, the exaggerated shape (obtained from CariGeoGAN ) serves to exaggerate the stylized face (obtained from CariStyGAN ) via image warping. The warping is done by a differentiable spline interpolation module. 6. In CariGeoGAN, we use the PCA representation of facial landmarks instead of landmarks themselves as the input and output of GAN. This representation implicitly enforces the constraint of face shape prior in the network. 7. In the second stage, we use CariStyGAN to learn the appearance-to-appearance translation from photo to caricature while preserving its geometry. Here, we need to synthesize an intermediate result y ′ ∈ Y ′ , which is assumed to be as close as caricature domain Y in appearance and as similar as photo domain X in shape. HOW ARE THEY DOING IT ? 11
  • 12. HOW ARE THEY DOING IT ? 12
  • 13. GEOMETRY DEFORMATION 13 1. Face shape can be represented by 2D face landmarks either for real face photos X ,or for caricatures Y . 1. We manually label 63 facial landmarks for each image in both X and Y .
  • 14. 1. To centralize all facial shapes, all images in both X and Y are aligned to the mean face via three landmarks (center of two eyes and center of the mouth) using affine transformation. 1. In addition, all images are cropped to the face region, and resized to 256 × 256 resolution for normalizing the scale 1. Note that the dataset of real faces is also used to finetune a landmark detector, which supports automatic landmark detection in the inference stage. 1. For our CariGeoGAN, to further reduce dimensions of its input and output, we further apply principal component analysis (PCA) on the landmarks of all samples in X and Y . We take the top 32 principal components to recover 99.03% of total variants. Then the 63 landmarks of each sample are represented by a vector of 32 PCA coefficients. This representation helps constrain the face structure during mapping learning. GEOMETRY DEFORMATION(CONTD) 14
  • 15. 1. CariGeoGAN is inspired by the network architecture of CycleGAN. 1. The forward generator learns the mapping from real images to caricature images while the backward generator learns the exact inverse of the forward generator. 1. As it is a Cycle Gan. Both the backward and forward generators share weights in both the paths. 1. The discriminators learn to distinguish real samples and generated samples. CARIGEOGAN 15
  • 17. CARIGEOGAN (LOSSES) 17 We define three types of loss in the CariGeoGAN. 1. The first is the adversarial loss, which is widely used in GANs. Specifically, they adopted the adversarial loss of LSGAN to encourage generating landmarks indistinguishable from the hand-drawn caricature landmarks sampled from domain Y . Similarly another loss is defined for the generating portrait photo landmarks that cannot be distinguished from real photos. 1. The second is the bidirectional cycle-consistency loss, which is also used in CycleGAN to constrain the cycle consistency between the forward mapping and the reverse mapping. The idea is that when we apply reverse mapping to the synthesized image we should get back to the original input image.
  • 18. CARIGEOGAN (LOSSES) 18 3. Cycle-consistency loss further helps constrain the mapping solution from the input to the output. However, it is still weak to guarantee that the predicted deformation can capture the distinct facial features and then exaggerate them. The third is a new characteristic loss, which penalizes the cosine differences between input landmark and the predicted one after subtracting its corresponding means. ● The characteristic loss in the reverse direction is defined similarly. ● The underlying idea is that the differences from a face to the mean face represent its most distinctive features and thus should be kept after exaggeration. For example, if a face has a larger nose compared to a normal face, this distinctiveness will be preserved or even exaggerated after converting to caricature. Our objective function for optimizing CariGeoGAN is:
  • 21. PERFORMANCE 21 1. Our core algorithm is developed in PyTorch. All of our experiments are conducted on a PC with an Intel E5 2.6GHz CPU and an NVIDIA K40 GPU. The total runtime for a 256 × 256 image is approximately 0.14 sec., including 0.10 sec. for appearance stylization, 0.02 sec. for geometric exaggeration and 0.02 sec. for image warping. 1. The perceptual study shows that caricatures generated by CariGANs are closer to the hand-drawn ones, and at the same time better persevere the identity, compared to state-of-the-art methods. 1. Moreover, our CariGANs allow users to control the shape exaggeration degree and change the color/texture style by tuning the parameters or giving an example caricature.
  • 23. EXTENSIONS / SCOPE 23 Video Caricature: We directly apply our CariGANs to the video frame by frame. Since the CariGANs exaggerate and stylize the face according to facial features. The results are overall stable in different frames.
  • 24. EXTENSIONS / SCOPE 24 Caricature-to-photo translation: Since both CariGeoGAN and CariGeoStyGAN are trained to learn the forward and backward mapping symmetrically, we can reverse the pipeline to convert an input caricature into its corresponding photo.
  • 26. REFERENCES 26 CariGANs: Unpaired Photo-to-Caricature Translation KAIDI CAO∗†,Tsinghua University JING LIAO‡,City University of Hong Kong, Microsoft Research LU YUAN,Microsoft AI Perception and Mixed Reality Can an AI Learn To Draw a Caricature? From Two Minute Papers