Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Vulnerabilities of machine learning infrastructure


Published on

The boom of artificial intelligence brought to the market a set of impressive solutions both on hardware and software sides. On the other hand, massive implementation of AI in various areas brings about problems, and security is one of the greatest concerns. The speaker will present results of hands-on vulnerability research of different components of AI infrastructure, including NVIDIA DGX GPU servers, ML frameworks, such as PyTorch, Keras, and TensorFlow, data processing pipelines and specific applications, including medical imaging and face recognition–powered CCTV. Updated Internet Census toolkit based on the Grinder framework will be introduced.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Vulnerabilities of machine learning infrastructure

  1. 1. 1 Vulnerabilities of Machine Learning Infrastructure Sergey Gordeychik @scadasl
  2. 2. Sergey Gordeychik  AI and Cybersecurity Executive • Abu Dhabi, UAE  Visiting Professor, Cyber Security • Harbour.Space University, Barcelona, Spain  Bandleader,  Cyber-physical troublemaker • SCADA Strangelove, HackingOdyssey •, @scadasl  Ex… • Deputy CTO, Kaspersky Lab • CTO, Positive Technologies • Gartner recognized products and services  Program Chair, PHDays Conference •, Moscow 2
  3. 3. Disclaimer Please note, that this talk is by Sergey and Hacking Odyssey group. We don't speak for our employers. All the opinions and information here are of our responsibility. So, mistakes and bad jokes are all OUR responsibilities. 3 Hacking Odyssey Group Sergey Gordeychik Anton Nikolaev Denis Kolegov Maria Nedyak Roman Palkin Hacking Odyssey Projects Grinder Framewrok AISec DICOM Sec SD-WAN New Hop
  4. 4. 4
  5. 5. 5 PWN? Adversarial example anyone?
  6. 6. 6 Adversarial example?
  7. 7. 7
  8. 8. 8
  9. 9. 9
  10. 10. 10
  11. 11. 11 Hacking as usual…
  12. 12. 12 Spherical AI traveling in a vacuum?
  13. 13. 13 What is Cyber? What is Cybersecurity?
  14. 14. 14 Cybersecurity goals? HOLY CIA TRINITY
  15. 15. 15 OT/ICS/SCADA Security?! SCADA Security Basics: Integrity Trumps Availability, ISA/IEC 62443-2-1 standards (formerly ISA-99) Marina Krotofil, Damn Vulnerable Chemical Process 2014_Krotofil.pdf
  16. 16. 16 Machine Learning and AI? AI security
  17. 17. 17 Upside down?
  18. 18. 18
  19. 19. 19 James Mickens, Harvard University, USENIX Security '18-Q: Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible?
  20. 20. 20 Mission-centric Cybersecurity Gapanovich, Rozenberg, Gordeychik, Signalling cyber security: the need for a mission-centric approach a process that ensures control object operation with no dangerous failures or damage, but with a set economic efficiency and reliability under adversarial anthropogenic information influence
  21. 21. 21 But what about?... dangerous failures? economic efficiency? reliability level?
  22. 22. 22
  23. 23. 23 But what about?... dangerous failures? economic efficiency? reliability level? Build the Threat Model First!
  24. 24. 24 AI Threat Model Li, K. (n.d.). Reverse Engineering AI Models.
  25. 25. 25 But what about?... Cloud AUC/ROC Privacy IP protection Federative learning Insane androids?… 25 AI security
  26. 26. 26 NCC Group, Building safer machine learning
  27. 27. 27 What is AI Infrastructure?
  28. 28. 28 You should scan all these Internets for AI
  29. 29. 29 Grinder Framework
  30. 30. AIFinger Project The goals of the project is to provide tools and results of passive and active fingerprinting of Machine Learning Frameworks and Applications using a common Threat Intelligence approach and to answer the following questions:  How to detect ML backend systems on the Internet and Enterprise network?  Are ML apps secure at Internet scale?  What is ML apps security level in a general sense at the present time?  How long does it take to patch vulnerabilities, apply security updates to the ML backend systems deployed on the Internet? Contributors: ● Sergey Gordeychik ● Anton Nikolaev ● Denis Kolegov ● Maria Nedyak
  31. 31. AIFinger Project Coverage  Frameworks ○ TensorFlow ○ NVIDIA DIGITS ○ Caffe ○ TensorBoard ○ Tensorflow.js ○ brain.js ○ Predict.js ○ ml5.js ○ Keras.js ○ Figue.js ○ Natural.js ○ neataptic.js ○ ml.js ○ Clusterfck.js ○ Neuro.js ○ Deeplearn.js ○ Convnet.js ○ Synaptic.js ○ Apache mxnet  Databases with ML Content ○ Elasticsearch with ML data ○ MongoDB with ML data ○ Docker API with ML data  Databases ○ Elasticsearch ○ Kibana (Elasticsearch Visualization Plugin) ○ Gitlab ○ Samba ○ Rsync ○ Riak ○ Redis ○ Redmon (Redis Web UI) ○ Cassandra ○ Memcached ○ MongoDB ○ PostgreSQL ○ MySQL ○ Docker API ○ CouchDB  Job and Message Queues ○ Alibaba Group Holding AI Inference ○ Apache Kafka Consumer Offset Monitor ○ Apache Kafka Manager ○ Apache Kafka Message Broker ○ RabbitMQ Message Broker ○ Celery Distributed Task Queue ○ Gearman Job Queue Monitor  Interactive Voice Response (IVR) ○ ResponsiveVoice.JS ○ Inference Solutions  Speech Recognition ○ Speech.js ○ dictate.js ○ p5.speech.js ○ artyom.js ○ SpeechKITT ○ annyang Measuring Artificial Intelligence and Machine Learning Implementation Security on the Internet
  32. 32. 32 Results (April 2020)
  33. 33. 33 Databases
  34. 34. 34 Dockers
  35. 35. 35 NVIDIA DIGITS  Training logs  Datasets  Model design
  36. 36. 36 Tensorboard  …  Everything  + vulns The TensorFlow server is meant for internal communication only. It is not built for use in an untrusted network. Totally more than 120 results
  37. 37. Kubeflow
  38. 38. June 2020 security-risk/ Large scale campaign against Kubernetes and Kuberflow clusters that abused exposed Kubernetes dashboards for deploying cryptocurrency miner observed deployment of a suspect image from a public repository on many different clusters. The image is ddsfdfsaadfs/dfsdf:99. By inspecting the image’s layers, we can see that this image runs an XMRIG miner:
  39. 39. 39 To find a ML Server in the Internet?
  40. 40. 40 GPGPU?
  41. 41. 41 Crypto currency on GPGPU in 2019?
  42. 42. 42 DGX-1  8 Tesla V100-32GB  TFLOPS (deep learning) 1000  CUDA Cores 40,960  Tensor Cores 5,120  $130,000  Good hashcat rate :) NetNTLMv2: 28912.2 MH/s MD5: 450.0 GH/s SHA-256: 59971.8 MH/s MS Office 2013: 163.5 kH/s bcrypt $2*$, Blowfish (Unix): 434.2 kH/s
  43. 43. 43 Other things?
  44. 44. 44 SNMPWALK
  45. 45. 45 Ok, let’s scan! Nmap scan report for X.X.X.X Host is up (0.010s latency). Not shown: 991 closed ports PORT STATE SERVICE VERSION 22/tcp open ssh OpenSSH 6.0p1 Debian 4 (protocol 2.0) 80/tcp open http lighttpd 427/tcp open svrloc? 443/tcp open ssl/http lighttpd 623/udp open ipmi 554/tcp filtered rtsp 1723/tcp filtered pptp 5120/tcp open barracuda-bbs? 5988/tcp open wbem-http? 5989/tcp open ssl/wbem-https?
  46. 46. 46 CVE-2013-4786 - 2019
  47. 47. 47 Use c0mp13x passwords!
  48. 48. 48 I have only one question! Why it still enabled by default in 2020? What do you need a helmet for? How the complex password will help?!!
  49. 49. 49 Strange certificate Issued by Quanta Computers Inc? 128 bytes (1024) RSA key?.. Issued 17 of April 2017… Same serial over the Internet!!!
  50. 50. 51 Find and decode firmware Google for Quanta Computers BMC firmware binwalk 7-zip Voilà
  51. 51. 52 Grep the cert and keys TLS services on BMC uses RSA 1024 with weak cyphers, default Diffie- Hellman primitives. The private/public keys are hardcoded in firmware and are the same for many instances of Quanta Computers BMC, including NVIDIA DGX-1. Public and private keys can be found unencrypted in Firmware. This allow passively decrypt network communications without MITM conditions.
  52. 52. 53 Other greps? NetNTLMv2: 28912.2 MH/s MD5: 450.0 GH/s SHA-256: 59971.8 MH/s MS Office 2013: 163.5 kH/s bcrypt $2*$, Blowfish (Unix): 434.2 kH/s Can we use DGX to bruteforce DGX password hash?!
  53. 53. 54 Or just ask Google?!
  54. 54. 55 IPMI passwords /conf/BMC1/IPMIConfig.dat
  55. 55. 56Looks like encryption
  56. 56. 57 …and decryption BlowFish without IV is used as implemented in Hint:
  57. 57. 58 Lesson learned • Please don’t use one way hashing with salt. Use plaintext or reversible encryption. • Password encryption key should be hardcoded and stored in same folder as a user database. • It is important to keep it like the product name. • Store it in several places across the filesystem for resilience.
  58. 58. 59 Hardcoded RC4 Key in JViewer-SOC • JViewer-SOC (KVM and IPMI applet) use RC4 cipher with a hardcoded key for traffic encryption. • In the JViewer-SOC java applet package contains Decoder • class. • This class defines DecodeKeys constant which is equal to “fedcba9876543210”. • Constant is used to initialize RC4 key scheduling (expansion) algorithm. This allows an attacker to bypass security features, decrypt traffic and extract sensitive information.
  59. 59. 60 Insecure random number generator in RAKP/AES • JSOL.jar/com/ami/jsol/common/ defines functions random4ByteArray and random16ByteArray. • The Random function from java.util.Random class is used. • These functions are used within RAKP crypto protocol implementation. • According to the specification of the RAKP it is based on Bellare-Rogaway protocols . • The issue is that the 1 protocols require random numbers in cryptographically sense. The same function is used to generate IV for AES encryption in the processEncryption function of IPMISession class.
  60. 60. 61 CSRF is not an issue…. A vulnerability to Cross-Site Request Forgery (CSRF) attack was found in the Nvidia BMC Web Service. It allows an attacker to force an authenticated user to execute the API endpoints within the web application. There is a list of internal queries which require active session authentication and don’t require CSRF token. /rpc/ getsessiontoken .asp /rpc/ getrole.asp /rpc/ getadvisercfg.asp /rpc/ getvmediacfg.asp /rpc/ flash_browserclosed.asp /rpc/ getvideoinfo.asp /rpc/ getsessiontoken.asp /rpc/ getrole.asp /rpc/ downloadvideo.asp /rpc/ restarthttps.asp /rpc/ getvmediacfg.asp /rpc/ getadvisercfg.asp
  61. 61. 62 Unrestricted SingImage key upload SingImage upload feature in DGX-1 BMC accept any correct RSA 1024 public key without any verification. This key is used to verify firmware signature. SignImage upload routine, implemented in WebValidateSignImageKey function accept any correct RSA 1024 public key without any verification of authenticity of the key and store it in the /conf/public.pem. CheckImageSign function implemented in use public.pem to verify firmware signature.
  62. 62. 63 Unrestricted File Upload through CSRF Web-server handler defines stripped function at 0x8BE0 address. This function is being called when an authorized user sends POST request to /page/file_upload.html . If a POST request is multipart/form-data this function checks for file argument and if its name doesn’t end with a ‘/’ symbol¨ looks up for a file path in the hardcoded fille-argument-name-to- file-path mapping. However if the argument name ends with ‘/’¨ file is being saved at the file system defined as file argument name filename. Thus it is possible to upload custom files and overwrite existing ones with user-defined absolute path. Example attack vector - overwrite ./shadow or ./passwd file in the “/conf/” folder to create/modify users and/or replace default shell to get remote root access via ssh. Vulnerability can be exploited via CSRF.
  63. 63. 64 Attack
  64. 64. 65 List of fixes AISec-NV-2019-01 - Hardcoded admin user (CVE-2020-11483) AISec-NV-2019-03 - SNMP with well-known community strings enabled by default (CVE-2020-11489) AISec-NV-2019-04 - Hardcoded RSA keys and self-signed certificate for TLS (CVE-2020-11487) AISec-NV-2019-10 - Insecure random number generator in RAKP/AES (CVE-2020-11616) AISec-NV-2019-11 - Hardcoded RC4 Key in JViewer-SOC (CVE-2020-11615) AISec-NV-2019-15 – Internal methods are vulnerable to CSRF attack (CVE-2020-11485) AISec-NV-2019-16 – Unrestricted File Upload through CSRF (CVE-2020-11486) AISec-NV-2019-17 – Hardcoded IMPI passwords encryption key (CVE-2020-11484) AISec-NV-2019-18 – Unrestricted SingImage key upload (CVE-2020-11488) Credits: Sergey Gordeychik, Maria Nedyak, Denis Kolegov, Roman Palkin
  65. 65. 66 Other things?
  66. 66. 67 Any bugs there? We don’t know yet
  67. 67. 68 Disclosure timeline Tue, 3 Sep 2019, 16:42 – Initial submission Thu, 19 Sep 2019, 00:40– List of internet-faced DGXs collected by Grinder Sun, 22 Sep 2019, 23:05 – Ack and workaround discussion Sat, 5 Oct 2019, 19:50 – Remote root submission Tue 17 Dec 2019, 21:00 – Call with Alex Matrosov to discuss soooo responsible disclosure Feb 2020 – COVID 19 outbreak, cancellation of PHDays and OFFZONE April – Aug 2020 – GradeZero Rock’n’roll Tue, 25 Aug, 21:10 – Failed fix (QA issues) Now – Fixes, Initial disclosure @CodeBlue 2020 Kudos to Alex, Shawn, NVIDIA PSIRT
  68. 68. 69 Supply chain is a pain Megarac SP (DGX-1) Quanta Computer Inc. IBM (BMC Advanced System Management) Lenovo (ThinkServer Management Module) Hewlett Packard Enterprise Megarac Mikrobits (Mikrotik) Megarac SP-X (DGX-2) Netapp ASRockRack IPMI ASUS ASMB9-iKVM DEPO Computers TYAN Motherboard Gigabyte IPMI Motherboards Gooxi BMC
  69. 69. 70 Takeaways • Big Thing doesn’t mean good security • Good AI researches are bad cybersec pro • All vulnerabilities are important • Supply chain is a pain • Things are better with Grinder 
  70. 70. 71 Infection of the AI models
  71. 71. More parameters -> Longer train
  72. 72. Pre-trained model workflow 1. Model interface (some wrapper, cli, etc.) .py / .sh / etc 2. Download the weights in some form 3. Run the model .pb / .h5 / .pth .json / .yml /.csv
  73. 73. Distribution •~ 2k repos on github •~ 100 repos on gitlab •~ 500 models on
  74. 74. Documentation Whole model Weights only PyTorch model (.pth)
  75. 75. Reality Whole model Weights only
  76. 76. Step 1. Find an existing model 78
  77. 77. Step 2. Infect it! Overwrite the magic number `Classic` Pickle payload Python code to execute on load Shell code to run on load 79
  78. 78. Python Pickle Injection  Pickle is a python package used to 'serialize' an object to string format and store them to or load from a file.  Pickle is a simple stack language, which means pickle has a variable stack. • Every time it finished 'deserializing' an object it stores it on the stack. • Every time it reaches a '.' while 'deserializing', it pop a variable from the stack.  Besides, pickle has a temporary memo, like a clipboard.  'p0', 'p1' means put the top obj on the stack to memo and refer it as '0' or '1'  'g0', 'g1' act as get obj '0' or '1'  Pickle has two packages: pickle and cPickle, they have some specific differences like different methods, but most of the case they act in the same way.
  79. 79. Step 3. Upload it Link to our malicious file 81
  80. 80. •Just one command to run from anywhere! •torch.hub.load(“ChickenDuo/top”, “model”) 82
  81. 81. 83
  82. 82. Cross-platform -> Another approach 84
  83. 83. Serialization Save d Mode l Grap h File (.pb) Variable s Asset s Constants and static Logi c
  84. 84. Custom serialization •Protobuf format (.pb) •~1300 operations (math, conditionals, statistics, etc.) •Only TWO of them were found dangerous •WriteFile (any text, any file) •ReadFile (any file) 18 Looks like Google is aware of them
  85. 85. Graph serialization Resul t Tens o r Some ops Payload > result? Resu lt Tens o r Some ops Payload ops Tru e Fals e
  86. 86. Code Read the existing graph and rename the “ending” tensor Execute func to determine which route to take (tensor or tensor) Write it all back
  87. 87. Wrapper Check if file exists Append our payload to a file
  88. 88. Wrapper Check if file exists Append our payload to a file
  89. 89. Keras model Serialization Save d Mode l Keras with h5 Weights onlyModel from config 92
  90. 90. Serialization with topology - Only Keras layers (Functional model) - … has a Lambda layer, which serialize custom python function with marshal ( team/keras/blob/master/k eras/layers/ - No warning on launching third-party models! ©
  91. 91. Example 94
  92. 92. Timeo Danaos et dona ferentes `torch.load()` uses ``pickle`` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never load data that could have come from an untrusted source, or that could have been tampered with. **Only load data you trust**.
  93. 93. 96 Hacking Medical Imaging
  94. 94.
  95. 95. Face recognition  170 000 cameras across the city  Face recognition system based on FindFace technology  The current face recognition system operates on the "black lists" (criminals, missed people)  The system does not compare all people caught in the camera with all residents of Moscow!
  96. 96. Let’s check it out! • Segmentation dons not works • Or works, but with poor accuracy • Questions • The presence of a biometric DB • The relevance of the biometric DB • Biometric attacks • Use of masks, etc. • False positive handling
  97. 97. Biometric DB White List (anyone you can) • Upload photos via the app Blacklist (not allowed) • Register when a COVID is detected • Other citizens ??? Where to get? How to compare with the person?
  98. 98. Biometrics attacks  Presentation attack (liveness)  Morphing attack  Сv dazzle  Aging effect
  99. 99. Jan Krissler, “Ich sehe, also bin ich ... Du” 
  100. 100. 103 Small ad
  101. 101. 104 What can we do? For Researchers AI Cybersecurity is Green Field From SDN to Model Privacy, from Secure SDL to Adversarial Robustness For Enterprises Don’t trust AI if adversarial “input” is possible AI IS NOT spherical model traveling in a vacuum! For Governments Centralize data and annotation Force vendors to follow security best practices from the beginning Detect and control AI-based abuses