Deepfakes: An Emerging Internet Threat and their Detection

Deepfakes: An Emerging Internet Threat
and their Detection
Dr. Symeon (Akis) Papadopoulos – @sympap
MeVer Team @ Information Technologies Institute (ITI) /
Centre for Research & Technology Hellas (CERTH)
In collaboration with Polychronis Charitidis, George Kordopatis-Zilos,
Nikos Sarris and Yiannis Kompatsiaris
AI4EU Café, Dec 16th 2020
Media Verification
(MeVer)

DeepFakes: Definition
• Content, generated by deep neural
networks, that seems authentic to
human eye
• Most common form: generation and
manipulation of human face
Source: https://en.wikipedia.org/wiki/Deepfake
Source: https://www.youtube.com/watch?v=iHv6Q9ychnA
Source: Media Forensics and DeepFakes: an overview

1
DeepFakes in the News
3
DeepFakes Detection
4
Our Lessons Learned
2
DeepFake Basics

State of DeepFakes
• Quick increase of DF content online
• Majority of DF content is pornographic
• Significant reach
• Vast majority of subjects in DF
pornographic videos are actresses and
musicians
• Subjects in YT DF videos also include
politicians and business people
Ajder, H., Patrini, G., Cavalli, F., Cullen, L. (2019).The State of DeepFakes:
Landscape, Threats and Impact. Report by DeepTraceLabs/Sensity.

Gaining popularity
Nguyen, T. T., et al. (2019). Deep learning for
deepfakes creation and detection. arXiv preprint
arXiv:1909.11573, 1.
Ajder, H., et al. (2019).The State of DeepFakes:
Landscape, Threats and Impact. Report by
DeepTraceLabs/Sensity.

https://www.wired.com/story/telegram-still-hasnt-removed-an-ai-bot-thats-abusing-women/
“The bot uses a version of the
DeepNude AI tool, which was
originally created in 2019, to
remove clothes from photos of
women and generate their
body parts. Anyone can easily
use the bot to generate
images. More than 100,000
such images have been publicly
shared by the bot in several
Telegram chat channels
associated with it. ”

DeepFakes and Privacy Risks
https://www.androidpolice.com/2019/09/03/zao-deepfake-app-privacy/
The app quickly garnered negative
press focusing on privacy concerns
in heavily surveilled China, of all
places. Reporters cited the user
agreement, which gives the
company behind ZAO the right to
use any imagery created on the
app for free and for all purposes,
with no option to retreat from the
consent once accepted. ZAO has
since responded and updated the
agreement, writing that it changed
the controversial passages and that
it would remove any user-deleted
content from its servers, too…..

Reface: The Normalization of DeepFakes
“The app normalises
deepfakes, and not
everyone understands the
concerns arising from them
because not everyone has
the digital know-how to
differentiate what is real
and what isn’t,” Apurva
Singh, a privacy expert and
volunteer legal counsel at
Software Freedom Law
Center, India….
https://www.vice.com/en/article/wxqkbn/viral-reface-app-
going-to-make-deepfake-problem-worse

Fake Identities
But Katie Jones doesn’t exist, The
Associated Press has determined.
Instead, the persona was part of a
vast army of phantom profiles
lurking on the professional
networking site LinkedIn. And
several experts contacted by the
AP said Jones’ profile picture
appeared to have been created by
a computer program….
https://apnews.com/article/bc2f19097a4c4ff
faa00de6770b8a60d

DeepFakes and Politics
One week after the video’s release, Gabon’s military
attempted an ultimately unsuccessful coup—the country’s
first since 1964—citing the video’s oddness as proof
something was amiss with the president.
https://www.motherjones.com/politics/2019/03/deepfake
-gabon-ali-bongo/
Mr Nguyen said he could not rule out the video being a
‘deepfake’, a term for the fairly new artificial intelligence
based technology which involves machine learning
techniques to superimpose a face on a video.
https://www.sbs.com.au/news/a-gay-sex-tape-is-threatening-
to-end-the-political-careers-of-two-men-in-malaysia

https://www.npr.org/2020/10/01/918223033/where-are-the-
deepfakes-in-this-presidential-election?t=1607638691173
https://mobile.twitter.com/SilERabbit/st
atus/1254551597465518082

Manipulation types
Facial manipulations can
be categorised in four
main different groups:
• Entire face synthesis
• Attribute manipulation
• Identity swap
• Expression swap
Source: DeepFakes and Beyond: A Survey of Face Manipulation and Fake
Detection (Tolosana et al., 2020)
Tolosana, R., et al. (2020). Deepfakes and beyond:
A survey of face manipulation and fake
detection. arXiv preprint arXiv:2001.00179.
Verdoliva, L. (2020). Media forensics and deepfakes:
an overview. arXiv preprint arXiv:2001.06564.
Mirsky, Y., & Lee, W. (2020). The Creation and
Detection of Deepfakes: A Survey. arXiv preprint
arXiv:2004.11138.
reenactmentreplacement
editing

Basic Principle: Encoder/Decoder Scheme
https://jonathan-hui.medium.com/how-deep-learning-fakes-videos-deepfakes-and-how-to-detect-it-c0b50fbf7cb9
Face angle, skin tone,
facial expression, lighting
Person 1-specific features
Person 2-specific features

Common DF Neural Network Architectures
Mirsky, Y., & Lee, W. (2020). The Creation and Detection of Deepfakes: A Survey. arXiv
preprint arXiv:2004.11138.

DeepFake Creation Pipeline
Mirsky, Y., & Lee, W. (2020). The Creation and Detection of Deepfakes: A Survey. arXiv
preprint arXiv:2004.11138.

DF Tools
• FaceSwap: https://faceswap.dev/
• DeepFaceLab:
https://github.com/iperov/DeepFaceLab
• Dfaker: https://github.com/dfaker/df
• StyleGAN2:
https://github.com/NVlabs/stylegan2

DF Quality Rapidly Improving
https://twitter.com/goodfellow_ian/status/1084973596236144640

Signs of a DeepFake (in 2020)
• Different kinds of
artifacts
• Blurry areas around lips,
hair, earlobs
• Lack of symmetry
• Lighting inconsistencies
• Fuzzy background
• Flickering (in video)
https://apnews.com/article/bc2f19097a4c4fffaa00de6770b8a60d

Test your Skills
• Which face is real? https://www.whichfaceisreal.com/
• Can you spot the deepfake video? https://detectfakes.media.mit.edu/

Detection using physiological features
• Exploiting the eye blinking information which is a physiological signal
that is not well presented in the synthesized fake videos.
Y. Li, M. Chang, and S. Lyu, “In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking,” in Proc. IEEE International
Workshop on Information Forensics and Security, 2018.

Detection using physiological features
• Based on observing subtle changes of colour and motion in RGB
videos, that enable methods such as colour based remote
photoplethysmography (rPPG or iPPG)
Ciftci, U. A., Demir, I., & Yin, L. (2020). Fakecatcher: Detection of synthetic portrait videos using biological signals. IEEE Transactions
on Pattern Analysis and Machine Intelligence.

Detection using head pose features
• Exploiting the errors that can be introduced by deepfake generation
methods in 3D head poses.
Yang, X., Li, Y., & Lyu, S. (2019, May). Exposing deep fakes using inconsistent head poses. In ICASSP 2019-2019 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP) (pp. 8261-8265). IEEE.

Artifact-based detection methods
• Exploiting artifacts from specific generation methods
Matern, F., Riess, C., & Stamminger, M. (2019, January). Exploiting
visual artifacts to expose deepfakes and face manipulations. In 2019
IEEE Winter Appl. of Computer Vision Workshops (WACVW) (pp. 83-92)
Visual artifacts
Limited resolution
Li, Y., & Lyu, S. (2019). Exposing DeepFake Videos By Detecting
Face Warping Artifacts. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops (pp. 46-52).

Artifact-based detection methods
• Exploits the deepfake
generation step of
blending the altered face
into an existing
background image
• Localizes the
manipulation region of
the face
Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., & Guo, B. (2020). Face x-ray for more general face forgery detection. In Proc. of IEEE/CVF
Conference on Computer Vision and Pattern Recognition (pp. 5001-5010).

CNN-based approaches
• MesoNet
• XceptionNet
• Capsule Networks
Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018, December). Mesonet: a compact facial video
forgery detection network. In 2018 IEEE International Workshop on Information Forensics and
Security (WIFS) (pp. 1-7).
Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019).
Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE
International Conference on Computer Vision (pp. 1-11).
Nguyen, H. H., Yamagishi, J., & Echizen, I. (2019, May). Capsule-forensics: Using capsule networks to
detect forged images and videos. In ICASSP 2019-2019 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) (pp. 2307-2311). IEEE.

CNN-based approaches
• Exploiting the temporal dimension using recurrent neural networks
Sabir, E., Cheng, J., Jaiswal, A., AbdAlmageed, W., Masi, I., & Natarajan, P. (2019). Recurrent Convolutional Strategies for Face Manipulation
Detection in Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 80-87).

Frequency domain GAN-fake face detection
• Exploiting
• frequency-aware decomposed image components (FAD)
• local frequency statistics (LFS)
• Fusion of these features
Qian, Y., Yin, G., Sheng, L., Chen, Z., & Shao, J. (2020, August). Thinking in frequency: Face forgery detection by mining frequency-aware clues.
In European Conference on Computer Vision (pp. 86-103). Springer, Cham.

Frequency domain GAN-fake face detection
• Two similar approaches, exploiting the fact that common up-sampling
methods, i.e. known as up-convolution or transposed convolution,
are causing the inability of such models to reproduce spectral
distributions of natural training data correctly.
Wang, S. Y., Wang, O., Zhang, R., Owens, A., & Efros, A. A. (2020). CNN-generated images are surprisingly easy to spot... for now. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (Vol. 7).
Durall, R., Keuper, M., & Keuper, J. (2020). Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce
Spectral Distributions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7890-7899).

Performance of SotA Approaches on FF++
Method Dataset Metric Performance
Matern et al. (2019) FF++ / DFD AUC 0.78
Li et al. (2019) FF++ / DFD AUC 0.93
Li et al. (2020) FF++ AUC 0.99
Afchar et al. (2018) FF++ (NeuralTextures) Acc 85%
Rossler et al. (2019) FF++ (all video qualities) Acc 91%
Nguyen et al. (2019) FF++ (all video qualities) Acc 92%
Masi et al. (2020) FF++ (all video qualities) Acc 94%
Ciftci et al. (2020) FF++ Acc 94%
Qi et al. (2020) FF++ (high quality) Acc 98%
Qian et al. (2020) FF++ Acc 93%
The FaceForensics++ dataset does not pose any challenge to SotA methods.

Performance of SotA methods on DFDC
The DFDC highlights the generalization challenge faced by SotA methods.

Context of our Research
https://weverify.eu/ https://ai4media.eu/
Innovation Action: 2018-2021
• Problem-driven: real-world testing
and issues
• Close interaction with end users
(journalists, citizens)
Research & Innovation Action: 2020-2024
• Research-driven: improve SotA and leverage
new advances in AI
• Close interaction with leading researchers

DeepFake Detection Challenge
• Goal: detect videos with facial or voice manipulations
• 2,114 teams participated in the challenge
• Log Loss error evaluation on public and private validation sets
• Public evaluation contained videos with similar transformations as the
training set
• Private evaluation contained organic videos and videos with unknown
transformations from the Internet
• Our final standings:
• public leaderboard: 49 (top 3%) with 0.295 Log Loss error
• private leaderboard: 115 (top 5%) with 0.515 Log Loss error
Source: https://www.kaggle.com/c/deepfake-detection-challenge

DFDC Dataset
• An order of magnitude bigger:
• Number of videos
• Number of frames
• Number of subjects
• Subject consent
• More deepfake generation methods
Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang,
M., & Ferrer, C. C. (2020). The DeepFake Detection
Challenge Dataset. arXiv preprint arXiv:2006.07397.

DeepFake Detection Challenge - dataset
• Dataset of more than 110k videos
• Approx. 20k REAL and the rest are FAKE
• FAKE videos generated from the REAL
• Models used:
• DeepFake AutoEncoder (DFAE)
• Morphable Mask faceswap (MM/NN)
• Neural Talking Heads (NTH)
• FSGAN
• StyleGAN
Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang,
M., & Ferrer, C. C. (2020). The DeepFake Detection Challenge
Dataset. arXiv preprint arXiv:2006.07397.

Dataset preprocessing - Issues
• Face dataset quality depends on face extraction accuracy (Dlib,
mtcnn, facenet-pytorch, Blazeface)
• Generally all face extraction libraries generate a number of false
positive detections
• Manual tuning can improve the quality of the generated dataset
Deep learning
model
Face
extraction
Frame
extraction
Video
corpus

Noisy data creeping in the training set
• Extracting faces with 1 fps from Kaggle DeepFake Detection Challenge dataset
videos using pytorch implementation of MTCNN face detection
• Observation: False detections are less compared to true detections in a video

Our “noise” filtering approach
• Compute face embeddings for each detected face in video
• Similarity calculation between all face embeddings in a video → similarity graph construction
• Nodes represent faces and two faces are connected if their similarities are greater than 0.8 (solid lines)
• Drop components smaller than N/2 (e.g. component 2)
• N is the number of frames that contain face detections (true or false).
Charitidis, P., Kordopatis-Zilos, G., Papadopoulos, S., & Kompatsiaris, Y. (2020). Investigating the impact of preprocessing and prediction
aggregation on the DeepFake detection task. Proceedings of the Conference for Truth and Trust Online (TTO), https://arxiv.org/abs/2006.07084

Advantages
• Simple and fast procedure
• No need for manual tuning of the face extraction settings
• Clusters of distinct faces in cases of multiple persons in the video
• This information can be utilized in various ways (e.g. predictions per face)
Faces extracted from multiple video frames
Component 1
Component 2

Experiments
• We trained multiple DeepFake detection models on the DFDC dataset
with and without (baseline) our proposed approach
• Three datasets: a) Celeb-DF, b) FaceForensics++, c) DFDC subset
• For evaluation we examined two aggregation approaches
• avg: prediction is the average of all face predictions
• face: prediction is the max prediction among different avg face predictions
• Results for the EfficientNet-B4 model in terms of Log loss error:
Pre-
processing
CelebDF FaceForensics++ DFDC
avg face avg face avg face
baseline 0,510 0,511 0,563 0,563 0,213 0,198
proposed 0,458 0,456 0,497 0,496 0,195 0,173

Our DFDC Approach - details
• Applied proposed preprocessing approach to clean the generated face dataset
• Face augmentation:
• Horizontal & vertical flip, random crop, rotation, image compression, Gaussian & motion
blurring, brightness, saturation & contrast transformation
• Trained three different models: a) EfficientNet-B3, b) EfficientNet-B4, c) I3D*
• Models trained on face level:
• I3d trained with 10 consecutive face images exploiting temporal information.
• EfficientNet models trained on single face images
• Per model:
• Added two dense layers with dropout after the backbone architecture with 256 and 1 units
• Used the sigmoid activation for the last layer
* ignoring the optical flow stream

Our DFDC approach – inference
pre-processing model inference post-processing

Lessons from other DFDC teams
• Most approaches ensemble multiple EfficientNet architectures (B3-B7) and
some of them were trained on different seeds
• ResNeXT was another architecture used by a top-performing solutions
combined with 3D architectures such as I3D, 3D ResNet34, MC3 & R2+1D
• Several approaches increased the margin of the detected facial bounding
box to further improve results.
• We used an additional margin of 20% but other works proposed a higher proportion.
• To improve generalization:
• Domain-specific augmentations: a) half face removal horizontally or vertically, b)
landmark (eyes, nose, or mouth) removal
• Mixup augmentations

Practical challenges
• Limited generalization
• This observation applies to most submissions. The winning team scored
0.20336 in public validation and only 0.42798 in the private (Log Loss)
• Overfitting
• The best submission in the public leaderboard scored 0.19207 but in the
private evaluation the error was 0.57468, leading to the 904-th position!
• Broad problem scope
• The term DeepFake may refer to every possible manipulation and generation
• Constantly increasing manipulation and generation techniques
• A detector is only trained with a subset of these manipulations

DeepFake Detection in the Wild
• Videos in the wild usually contain multiple scenes
• Only a subset of these scenes may contain DeepFakes
• Detection process might be slow for multi-shot videos (even short ones)
• Low quality videos
• Low quality faces tend to fool classifiers
• Small detected and fast-moving faces
• Usually lead to noisy predictions
• Changes in the environment
• Moving obstacles in front of the faces
• Changes in lighting

DeepFake Detection Service @ WeVerify
https://www.youtube.com/watch?v=cVljNV
V5VPw&ab_channel=TheFakening

Thank you!
Dr. Symeon Papadopoulos
papadop@iti.gr
@sympap
Media Verification (MeVer)
https://mever.iti.gr/
@meverteam
Ack. Polychronis Charitidis, George Kordopatis-Zilos,
Nikos Sarris and Yiannis Kompatsiaris

Deepfakes: An Emerging Internet Threat and their Detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deepfakes: An Emerging Internet Threat and their Detection

Similar to Deepfakes: An Emerging Internet Threat and their Detection (20)

More from Symeon Papadopoulos

More from Symeon Papadopoulos (20)

Recently uploaded

Recently uploaded (20)

Deepfakes: An Emerging Internet Threat and their Detection