Vulnerabilities of machine learning infrastructure

1
Vulnerabilities of Machine
Learning Infrastructure
Sergey Gordeychik
serg.gordey@gmail.com
http://scada.sl
@scadasl

Sergey Gordeychik
 AI and Cybersecurity Executive
• Abu Dhabi, UAE
 Visiting Professor, Cyber Security
• Harbour.Space University, Barcelona, Spain
 Bandleader, www.GradeZero.band
 Cyber-physical troublemaker
• SCADA Strangelove, HackingOdyssey
• www.scada.sl, @scadasl
 Ex…
• Deputy CTO, Kaspersky Lab
• CTO, Positive Technologies
• Gartner recognized products and services
 Program Chair, PHDays Conference
• www.phdays.com, Moscow
2

Disclaimer
Please note, that this talk is by Sergey and Hacking Odyssey group.
We don't speak for our employers.
All the opinions and information here are of our responsibility. So, mistakes and bad
jokes are all OUR responsibilities.
3https://github.com/sdnewhophttps://scada.sl/
Hacking Odyssey Group
Sergey Gordeychik
Anton Nikolaev
Denis Kolegov
Maria Nedyak
Roman Palkin
Hacking Odyssey Projects
Grinder Framewrok
AISec
DICOM Sec
SD-WAN New Hop

5
PWN?
Adversarial example
anyone?

11
Hacking as usual…
https://slideplayer.com/slide/4378533/

12
Spherical AI traveling in a vacuum?

13
What is Cyber?
What is
Cybersecurity?

14
Cybersecurity goals?
HOLY
CIA
TRINITY

15
OT/ICS/SCADA Security?!
SCADA Security Basics: Integrity Trumps Availability, ISA/IEC 62443-2-1 standards (formerly ISA-99)
https://www.tofinosecurity.com/blog/scada-security-basics-integrity-trumps-availability
Marina Krotofil, Damn Vulnerable Chemical Process
https://fahrplan.events.ccc.de/congress/2014/Fahrplan/system/attachments/2560/original/31CC_
2014_Krotofil.pdf

16
Machine Learning and AI?
AI security

17
Upside down?
https://giphy.com/explore/upside-down

18
https://giphy.com/gifs/movie-trailer-minions-yoJC2k4dPDRSInYfjq

19
James Mickens, Harvard University, USENIX Security '18-Q: Why
Do Keynote Speakers Keep Suggesting That Improving Security Is
Possible?
https://www.youtube.com/watch?v=ajGX7odA87k

20
Mission-centric Cybersecurity
Gapanovich, Rozenberg, Gordeychik, Signalling cyber security: the need for a mission-centric approach
https://www.railjournal.com/in_depth/signalling-cyber-security-the-need-for-a-mission-centric-approach
a process that ensures
control object operation with
no dangerous failures or
damage, but with a set
economic efficiency and
reliability under adversarial
anthropogenic information
influence

21
But what about?...
dangerous failures?
economic efficiency?
reliability level?

23
But what about?...
dangerous failures?
economic efficiency?
reliability level?
Build the Threat Model First!

24
AI Threat Model
Li, K. (n.d.). Reverse Engineering AI Models.

25
But what about?...
Cloud
AUC/ROC
Privacy
IP protection
Federative learning
Insane androids?…
25
AI security

26
NCC Group, Building safer machine learning
https://www.nccgroup.trust/uk/about-us/newsroom-and-events/blogs/2018/august/building-safer-machine-learning-systems-a-threat-model/

28
You should
scan all
these
Internets for
AI

29
Grinder Framework
github.com/sdnewhop/grinder

AIFinger Project
The goals of the project is to provide tools and results of passive and active fingerprinting of
Machine Learning Frameworks and Applications using a common Threat Intelligence
approach and to answer the following questions:
 How to detect ML backend systems on the Internet and Enterprise network?
 Are ML apps secure at Internet scale?
 What is ML apps security level in a general sense at the present time?
 How long does it take to patch vulnerabilities, apply security updates to the ML
backend systems deployed on the Internet?
sdnewhop.github.io/AISec/
github.com/sdnewhop/AISec
Contributors:
● Sergey Gordeychik
● Anton Nikolaev
● Denis Kolegov
● Maria Nedyak

AIFinger Project Coverage
 Frameworks
○ TensorFlow
○ NVIDIA DIGITS
○ Caffe
○ TensorBoard
○ Tensorflow.js
○ brain.js
○ Predict.js
○ ml5.js
○ Keras.js
○ Figue.js
○ Natural.js
○ neataptic.js
○ ml.js
○ Clusterfck.js
○ Neuro.js
○ Deeplearn.js
○ Convnet.js
○ Synaptic.js
○ Apache mxnet
 Databases with ML Content
○ Elasticsearch with ML data
○ MongoDB with ML data
○ Docker API with ML data
 Databases
○ Elasticsearch
○ Kibana (Elasticsearch
Visualization Plugin)
○ Gitlab
○ Samba
○ Rsync
○ Riak
○ Redis
○ Redmon (Redis Web UI)
○ Cassandra
○ Memcached
○ MongoDB
○ PostgreSQL
○ MySQL
○ Docker API
○ CouchDB
 Job and Message Queues
○ Alibaba Group Holding AI Inference
○ Apache Kafka Consumer Offset Monitor
○ Apache Kafka Manager
○ Apache Kafka Message Broker
○ RabbitMQ Message Broker
○ Celery Distributed Task Queue
○ Gearman Job Queue Monitor
 Interactive Voice Response (IVR)
○ ResponsiveVoice.JS
○ Inference Solutions
 Speech Recognition
○ Speech.js
○ dictate.js
○ p5.speech.js
○ artyom.js
○ SpeechKITT
○ annyang
Measuring Artificial Intelligence and Machine Learning Implementation Security on the Internet
https://www.researchgate.net/publication/337771481_Measuring_Artificial_Intelligence_and_Machine_Learning_Implementation_Security_on_the_Internet

32
Results (April 2020)
http://www.scada.sl/2020/04/ai-internet-census-april-2020.html

35
NVIDIA DIGITS
 Training logs
 Datasets
 Model design

36
Tensorboard
 …
 Everything
 + vulns
The TensorFlow server is meant
for internal communication only.
It is not built for use in an
untrusted network.
Totally more than 120
results

June 2020
https://www.microsoft.com/security/blog/2020/06/10/misconfigured-kubeflow-workloads-are-a-
security-risk/
Large scale campaign against Kubernetes and Kuberflow clusters
that abused exposed Kubernetes dashboards for deploying
cryptocurrency miner observed deployment of a suspect image
from a public repository on many different clusters. The image
is ddsfdfsaadfs/dfsdf:99. By inspecting the image’s layers, we can
see that this image runs an XMRIG miner:

39
To find a ML Server
in the
Internet?

41
Crypto currency on GPGPU in 2019?
https://www.zoomeye.org/searchResult?q=%2Bport%3A%225555%22%20%2Bservice%3A%22http%22%20NVIDIA

42
DGX-1
 8 Tesla V100-32GB
 TFLOPS (deep learning) 1000
 CUDA Cores 40,960
 Tensor Cores 5,120
 $130,000
 Good hashcat rate :) NetNTLMv2: 28912.2 MH/s
MD5: 450.0 GH/s
SHA-256: 59971.8 MH/s
MS Office 2013: 163.5 kH/s
bcrypt $2*$, Blowfish (Unix): 434.2 kH/s
https://hashcat.net/forum/thread-6972.html

45
Ok, let’s scan!
Nmap scan report for X.X.X.X
Host is up (0.010s latency).
Not shown: 991 closed ports
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 6.0p1 Debian 4 (protocol 2.0)
80/tcp open http lighttpd
427/tcp open svrloc?
443/tcp open ssl/http lighttpd
623/udp open ipmi
554/tcp filtered rtsp
1723/tcp filtered pptp
5120/tcp open barracuda-bbs?
5988/tcp open wbem-http?
5989/tcp open ssl/wbem-https?

48
I have only one question!
http://www.demotivation.us/i-have-only-one-question-1267735.html
Why it
still
enabled
by default
in 2020?
What do
you
need a
helmet
for?
How the complex password will help?!!

49
Strange certificate
Issued by Quanta Computers Inc?
128 bytes (1024) RSA key?..
Issued 17 of April 2017…
Same serial over the Internet!!!

51
Find and decode firmware
Google for Quanta Computers BMC firmware
binwalk
7-zip
Voilà

52
Grep the cert and keys
TLS services on BMC uses RSA 1024
with weak cyphers, default Diffie-
Hellman primitives.
The private/public keys are hardcoded
in firmware and are the same for many
instances of
Quanta Computers BMC, including
NVIDIA DGX-1.
Public and private keys can be found
unencrypted in
Firmware.
This allow passively decrypt network
communications without MITM
conditions.

53
Other greps?
NetNTLMv2: 28912.2 MH/s
MD5: 450.0 GH/s
SHA-256: 59971.8 MH/s
MS Office 2013: 163.5 kH/s
bcrypt $2*$, Blowfish (Unix): 434.2 kH/s
Can we use DGX to bruteforce DGX password hash?!

55
IPMI passwords
/conf/BMC1/IPMIConfig.dat

57
…and decryption
BlowFish without IV is used as implemented in libblowfish.so.2.5.0
Hint:

58
Lesson learned
• Please don’t use one way hashing with salt. Use plaintext or reversible
encryption.
• Password encryption key should be hardcoded and stored in same folder as a
user database.
• It is important to keep it like the product name.
• Store it in several places across the filesystem for resilience.

59
Hardcoded RC4 Key in JViewer-SOC
• JViewer-SOC (KVM and IPMI applet) use RC4 cipher with a hardcoded key for traffic
encryption.
• In the JViewer-SOC java applet com.ami.kvm.jviewer.soc.video package contains Decoder
• class.
• This class defines DecodeKeys constant which is equal to “fedcba9876543210”.
• Constant is used to initialize RC4 key scheduling (expansion) algorithm.
This allows an attacker to bypass security features, decrypt traffic and extract sensitive
information.

60
Insecure random number generator in RAKP/AES
• JSOL.jar/com/ami/jsol/common/Util.java defines functions random4ByteArray
and random16ByteArray.
• The Random function from java.util.Random class is used.
• These functions are used within RAKP crypto protocol implementation.
• According to the specification of the RAKP it is based on Bellare-Rogaway
protocols .
• The issue is that the 1 protocols require random numbers in cryptographically
sense.
The same function is used to generate IV for AES encryption in the processEncryption function
of IPMISession class.

61
CSRF is not an issue….
A vulnerability to Cross-Site Request Forgery (CSRF) attack was found in the Nvidia BMC
Web Service. It allows an attacker to force an authenticated user to execute the API
endpoints within the web application.
There is a list of internal queries which require active session authentication and don’t
require CSRF token.
/rpc/ getsessiontoken .asp
/rpc/ getrole.asp
/rpc/ getadvisercfg.asp
/rpc/ getvmediacfg.asp
/rpc/ flash_browserclosed.asp
/rpc/ getvideoinfo.asp
/rpc/ getsessiontoken.asp
/rpc/ getrole.asp
/rpc/ downloadvideo.asp
/rpc/ restarthttps.asp
/rpc/ getvmediacfg.asp
/rpc/ getadvisercfg.asp

62
Unrestricted SingImage key upload
SingImage upload feature in DGX-1 BMC accept any correct RSA 1024 public key without any verification.
This key is used to verify firmware signature.
SignImage upload routine, implemented in libifc.so.2.42.0 WebValidateSignImageKey function accept any
correct RSA 1024 public key without any verification of authenticity of the key and store it in the
/conf/public.pem.
CheckImageSign function implemented in libipmimsghndlr.so use public.pem to verify firmware signature.

63
Unrestricted File Upload through CSRF
Web-server handler libmodhapi.so defines stripped function at 0x8BE0
address. This function is being called when an authorized user sends POST request to
/page/file_upload.html .
If a POST request is multipart/form-data this function checks for file argument and if its name
doesn’t end with a ‘/’ symbol¨ looks up for a file path in the hardcoded fille-argument-name-to-
file-path mapping.
However if the argument name ends with ‘/’¨ file is being saved at the file system defined as file
argument name filename.
Thus it is possible to upload custom files and overwrite existing ones with user-defined
absolute path.
Example attack vector - overwrite ./shadow or ./passwd file in the “/conf/” folder to create/modify
users and/or replace default shell to get remote root access via ssh.
Vulnerability can be exploited via CSRF.

65
List of fixes
AISec-NV-2019-01 - Hardcoded admin user (CVE-2020-11483)
AISec-NV-2019-03 - SNMP with well-known community strings enabled by default (CVE-2020-11489)
AISec-NV-2019-04 - Hardcoded RSA keys and self-signed certificate for TLS (CVE-2020-11487)
AISec-NV-2019-10 - Insecure random number generator in RAKP/AES (CVE-2020-11616)
AISec-NV-2019-11 - Hardcoded RC4 Key in JViewer-SOC (CVE-2020-11615)
AISec-NV-2019-15 – Internal methods are vulnerable to CSRF attack (CVE-2020-11485)
AISec-NV-2019-16 – Unrestricted File Upload through CSRF (CVE-2020-11486)
AISec-NV-2019-17 – Hardcoded IMPI passwords encryption key (CVE-2020-11484)
AISec-NV-2019-18 – Unrestricted SingImage key upload (CVE-2020-11488)
Credits: Sergey Gordeychik, Maria Nedyak, Denis Kolegov, Roman Palkin

67
Any bugs there?
We don’t know yet

68
Disclosure timeline
Tue, 3 Sep 2019, 16:42 – Initial submission
Thu, 19 Sep 2019, 00:40– List of internet-faced DGXs collected by Grinder
Sun, 22 Sep 2019, 23:05 – Ack and workaround discussion
Sat, 5 Oct 2019, 19:50 – Remote root submission
Tue 17 Dec 2019, 21:00 – Call with Alex Matrosov to discuss soooo responsible
disclosure
Feb 2020 – COVID 19 outbreak, cancellation of PHDays and OFFZONE
April – Aug 2020 – GradeZero Rock’n’roll
Tue, 25 Aug, 21:10 – Failed fix (QA issues)
Now – Fixes, Initial disclosure @CodeBlue 2020
Kudos to Alex, Shawn, NVIDIA PSIRT

69
Supply chain is a pain
Megarac SP (DGX-1)
Quanta Computer Inc.
IBM (BMC Advanced System Management)
Lenovo (ThinkServer Management Module)
Hewlett Packard Enterprise Megarac
Mikrobits (Mikrotik)
Megarac SP-X (DGX-2)
Netapp
ASRockRack IPMI
ASUS ASMB9-iKVM
DEPO Computers
TYAN Motherboard
Gigabyte IPMI Motherboards
Gooxi BMC

70
Takeaways
• Big Thing doesn’t mean good security
• Good AI researches are bad cybersec pro
• All vulnerabilities are important
• Supply chain is a pain
• Things are better with Grinder 

71
Infection of the AI models
http://www.scada.sl/2019/11/malign-machine-learning-models-and-bad.html

More parameters -> Longer train

Pre-trained model workflow
1. Model
interface (some
wrapper, cli,
etc.)
.py / .sh /
etc
2. Download the
weights in some
form
3. Run the
model
.pb / .h5 / .pth
.json / .yml
/.csv

Distribution
•~ 2k repos on github
•~ 100 repos on gitlab
•~ 500 models on
https://modelzoo.co/

Documentation
Whole model Weights only
PyTorch model (.pth)

Reality
Whole model Weights only

Step 1. Find an existing model
78

Step 2. Infect it!
Overwrite
the magic
number
`Classic` Pickle
payload
Python code to
execute on load
Shell code
to run on
load
79

Python Pickle Injection
 Pickle is a python package used to 'serialize' an
object to string format and store them to or
load from a file.
 Pickle is a simple stack language, which means
pickle has a variable stack.
• Every time it finished 'deserializing' an object it
stores it on the stack.
• Every time it reaches a '.' while 'deserializing', it
pop a variable from the stack.
 Besides, pickle has a temporary memo, like a
clipboard.
 'p0', 'p1' means put the top obj on the stack to
memo and refer it as '0' or '1'
 'g0', 'g1' act as get obj '0' or '1'
 Pickle has two packages: pickle and cPickle,
they have some specific differences like
different methods, but most of the case they
act in the same way.
http://xhyumiracle.com/python-pickle-injection/

Step 3. Upload it
Link to our malicious
file
81

•Just one command to run from anywhere!
•torch.hub.load(“ChickenDuo/top”, “model”)
82

Cross-platform -> Another approach
84

Serialization
Save
d
Mode
l
Grap
h
File
(.pb)
Variable
s
Asset
s
Constants and
static
Logi
c

Custom serialization
•Protobuf format (.pb)
•~1300 operations (math, conditionals, statistics, etc.)
•Only TWO of them were found dangerous
•WriteFile (any text, any file)
•ReadFile (any file)
18
Looks like Google
is aware of them

Graph serialization
Resul
t
Tens
o r
Some
ops
Payload
>
result?
Resu
lt
Tens
o r
Some ops
Payload ops
Tru
e
Fals
e

Code
Read the existing graph
and rename the “ending”
tensor
Execute func to
determine which route
to take (tensor or
tensor)
Write it all back

Wrapper
Check if file exists
Append our payload to a
file

Keras model
Serialization
Save
d
Mode
l
Keras with h5
Weights onlyModel from config
92

Serialization with topology
- Only Keras layers (Functional model)
- … has a Lambda layer, which serialize
custom python function with marshal
(https://github.com/keras-
team/keras/blob/master/k
eras/layers/core.py#L566)
- No warning on launching third-party
models!
© keras.io

Timeo Danaos et dona ferentes
https://github.com/pytorch/pytorch/issues/31875
`torch.load()` uses ``pickle`` module implicitly, which is known to be
insecure. It is possible to construct malicious pickle data which will
execute arbitrary code during unpickling. Never load data that could have
come from an untrusted source, or that could have been tampered with.
**Only load data you trust**.

96
Hacking Medical Imaging
http://www.scada.sl/2020/07/hacking-odyssey-at-hitblockdown002.html

https://www.nbcnews.com/now/video/controversial-tech-company-pitches-facial-recognition-to-track-covid-19-82638917537

Face recognition
 170 000 cameras across the city
 Face recognition system based
on FindFace technology
 The current face recognition
system operates on the "black
lists" (criminals, missed people)
 The system does not compare
all people caught in the camera
with all residents of Moscow!

Let’s check it out!
• Segmentation dons not works
• Or works, but with poor accuracy
• Questions
• The presence of a biometric DB
• The relevance of the biometric DB
• Biometric attacks
• Use of masks, etc.
• False positive handling
https://www.betafaceapi.com/

Biometric DB
White List (anyone you can)
• Upload photos via the app
Blacklist (not allowed)
• Register when a COVID is
detected
• Other citizens ???
Where to get?
How to compare with the
person?

Biometrics attacks
 Presentation attack (liveness)
 Morphing attack
 Сv dazzle
 Aging effect

Jan Krissler, “Ich sehe, also bin ich ... Du”
 https://www.youtube.com/watch?v=VVxL9ymiyAU&t=1590

103
Small ad
https://harbour.space/cyber-security/courses/cybersecurity-of-machine-learning-and-artificial-intelligence
https://aftershock.news/?q=node/792241&full

104
What can we do?
For Researchers
AI Cybersecurity is Green Field
From SDN to Model Privacy, from Secure SDL to Adversarial
Robustness
For Enterprises
Don’t trust AI if adversarial “input” is possible
AI IS NOT spherical model traveling in a vacuum!
For Governments
Centralize data and annotation
Force vendors to follow security best practices from the beginning
Detect and control AI-based abuses

Vulnerabilities of machine learning infrastructure

Vulnerabilities of machine learning infrastructure

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Vulnerabilities of machine learning infrastructure

Similar to Vulnerabilities of machine learning infrastructure (20)

More from Sergey Gordeychik

More from Sergey Gordeychik (11)

Recently uploaded

Recently uploaded (20)

Vulnerabilities of machine learning infrastructure