4. Outline
Machine Learning (ML)
Malware Detection Methods
Signature based Behavior based Heuristic based
System call CFG …
emulator monitorVirtual machine
sandbox
5. Signature-Based Detection
This technique searches sequences of bytes in
order to identify a particular piece of malicious
software. A signature is composed of a
particular sequence of code or data. This
signature is stored on a database which is used
to compare to the scanned files. This is the most
common method used to detect malware, since
it produces a small error rate.
[0]
8. Encryption Methods
• Exclusive OR (XOR)
• ROT13
• Base64 encoding
• AES
• Code Packing
• …
• Oligomorphism
– The purpose of this technique is to produce a
different decryptor on every new infection.
[0][19]
23. Signature-Based AV is Dead
Machine Learning (ML)
Malware Detection Methods
Signature based Behavior based Heuristic based
System call CFG …
emulator monitorVirtual machine
sandbox
24. Behavior-Based Detection
A behavior-based detector basically consists of the
following components
• Data Collector
– This component collects dynamic / static information
about the execution
• Interpreter
– This component converts raw information collected by
data collection module into intermediate representations
• Matcher
– It is used to compare this representation with the
behavior signatures [7]
25. The Main Advantage of the
Behavior-Based Detection
“is the ability to detect the type of malwares
that signature base techniques are unable to
detect such as unknown and polymorphic
(behavior) malware variants. On the other hand,
non-availability of promising False Positive Ratio
(FPR) and also high amount of scanning
(database) time are the main disadvantages of
these behavior based malware detection
methods”
[7]
26. Improving Detection Performance
Machine Learning (ML)
Malware Detection Methods
Signature based Behavior based Heuristic based
System call CFG …
emulator monitorVirtual machine
sandbox
28. Running the Malware inside with
Isolation and Monitor (Golden Image)
Machine Learning (ML)
Malware Detection Methods
Signature based Behavior based Heuristic based
System call CFG …
emulator monitorVirtual machine
sandbox
29. The AV Detection Flow
with Isolation and Monitor
“The principle is executing the malware inside a
controlled environment in order to trigger the
unpacking of the executable in memory, detect the end
of the unpacking process by either using automated
unpacker or by monitoring the execution of writable
memory sections. Once the unpacking process is
detected, the collected data is re-run using Static-based
analysis or fed to the heuristic engine. “
– Cuckoo sandbox (open source)
– …
[2]
31. Notes of The AV with Isolation and
Monitor
• Where to trigger the malware
• Sandbox monitors software behavior in an
isolation environment implemented by using
emulator-based or hardware virtualization-
based methods (Isolation)
32. Evading an Isolation Environment
• AV detector (Detect sandbox-inherent
characteristics)
• Detect virtualization
[11]
33. Evading an Isolation Environment-
AV Detector
main(){
…
If ( AV is detected)
{
call main();
} else
{
decrpt();
run_malware();
}
34. One of Concepts of Implementing
AV Detector
Fingerprinting sandboxes
[12]
35. Categories of Fingerprinting
Sandboxes
• Timing
– sandboxes only run the file a limited time
• File
– e.g. The number of installed programs are examined by a
test case utilizing the Windows registry, as well as the
number of recently modified files - which are expected to
be none at a sandbox
• Process
– e.g. running processes will be less compared to regular
clients (clean installations of Windows)
• CPU instruction
– e.g. Incorrectly emulated instructions [9]
36. Categories of Fingerprinting
Sandboxes
• Environment
– e.g. users are expected to have password protected their accounts
while sandboxes most likely have not
• Hardware
– e.g. resources are expected to be higher on a user client than on a
sandbox running inside a virtual machine.
• Network
– e.g. the attacker reads the data returned and checks the first four
bytes of the return to find "<!do". This string is likely the "<!doctype
html>" tag that is found at the start of the Google website (and others).
I checked a few sandbox programs that try to mimic the Internet and
most of them just serve up an HTML page without the "<!doctype
html>" tag.
• … [9][13]
38. “Since the sandboxes were treated as black
boxes during the tests, no research could be
done” [9]
可發揮的地方e.g. [14]
AjMaChInE: 或許可以建立heuristic method
39. P: AV Detector非常強悍
S: “Facing the problem of evasion, researchers
proposed new techniques to analyze the hidden
behavior of evasive malware. A generic approach
to counter evasive behavior is exploring multiple
execution paths” [11] e.g. turn a greater-than
operation to a less-or-equal => 修改conditional
branches
42. The Disadvantage of Symbolic
Execution [15]
• Limitation of symbolic execution in unrolling
loops
– Typically, only a fixed number of times or a fixed
amount of time is spent to approximate the
analysis
• Unsolved conjectures
– are unproven propositions or theorems that
appear to be correct and have not been disproven
– e.g. 3x + 1 conjecture, 5x+1 conjecture, 7x+1
conjecture, Matthews conjecture, Juggler
sequence, etc..
58. Replacement Attack Arsenal -
Inserting Redundant Dependencies
• “NtSetInformationFile” can replace the
dependencies with FileHandle as medium, which has
been illustrated in above Fig.
• “NtDuplicateObject” returns a duplicated object
handle, which refers to the same object as the
original handle.
• The medium of “void *address”, we can insert
“NtQueryVirtualMemory” or
“NtReadVirtualMemory”, which do not affect the
mapped memory address.
59. Replacement Attack Arsenal -
Inserting Redundant Dependencies
• “NtQuery*” attack. There are several windows native APIs for
querying information of kernel objects, such as
“NtQueryAttributesFile”, “NtQueryKey”,
“NtQueryInformationProcess” and “NtQueryInformationFile”.
All of these query APIs take certain object handle as one of
input argument and output object information. No any
modification is introduced to the kernel objects. Hence
“NtQuery*” native APIs are good candidates for our
replacement attacks.
– For example,
• NtCreateFile → NtSetInformationFile
• NtCreateFile → NtQueryInformationFile” (“FileInformation”) ->
NtSetInformationFile
60. Replacement Attack Arsenal - Sub-
SCDG mutations
• Replication (複製行為). For example, we can
copy a file by calling “NtReadFile” and
“NtWriteFile” instead of using memory as
medium
• Modify registry for persistence [18]
• Code remote injection
• 就是改程式碼以及架構。
67. 目前Bot vs. Bot 看到就只是處理
signature對決,還沒看到behavior
evading處理。
68. Cat-and-Mouse Game
• AjMaChInE: unknown techniques (無限可能)
> know,所以我不相信AV可利用AI來解決。
– Signature-based is dead
– AV detector非常強悍
– Evade system-call-based behavior analysis
• 社群朋友: 防守AI加入直覺
69. Reference
• [0] 2017, Jhonattan J. Barriga A, etc. “Malware Detection and Evasion with
Machine Learning Techniques: A Survey”
• [1] 2018, Andrea Fortuna, “Malware hiding and evasion techniques”
• [2] 2014, Arne Swinnen, etc. “One packer to rule them all: Empirical
identification, comparison and circumvention of current Antivirus
detection techniques”,
• [3] 2017, “AMBER: Reflective PE Packer”,
https://github.com/EgeBalci/Amber
• *4+ 2017, “SpookFlare: Stay in Shadows”,
https://artofpwn.com/spookflare.html
• [5] 2015, “Advanced Antivirus Evasion Techniques”,
https://github.com/gpoulios/ROPInjector
• [6] 2018, Tim Blazytko, etc. “Breaking State-of-the-Art Binary Code
Obfuscation A Program Synthesis-based Approach”
70. Reference
• [7] 2013, Zahra Bazrafshan, etc. “A Survey on Heuristic Malware Detection
Techniques”
• [8] 2016, Ctruncer, ”The Art of AV Evasion - or Lack Thereof ”
• [9] 2016, Gustav Lundsgard, etc. “Bypassing modern sandobx technologies:
An experiment on sandbox evasion techniques”
• [10] 2016, Jeremy Blackthorne, etc. “AVLeak: Fingerprinting Antivirus
Emulators Through Black-Box Testing”
• [11] 2016, Michael Brengel, etc. “Detecting Hardware-Assisted
Virtualization”
• [12] 2016, Michael Brengel etc. , “Evading Malware Sandboxes”
• [13] 2017, MalwareJake, “Novel malware sandbox evasion”
• [14] 2017. “Spotless Sandboxes: Evading Malware Analysis Systems using
Wear-and-Tear Artifacts”
71. Reference
• [15] 2011, Zhi Wang, etc. “Linear Obfuscation to Combat Symbolic
Execution”
• [16] 2012, Weiqin Ma, etc. “Shadow Attacks: Automatically Evading
System-Call-Behavior Based Malware Detection”
• [17] 2015, Jiang Ming, etc. “Replacement Attacks: Automatically Impeding
Behavior-based Malware Specifications”
• [18] 2013, Scott Langendorf “Windows registry persistence, part 2: The
run keys and search-order”
• *19+ 2016, Andrea Fortuna, etc. “Malware obfuscation techniques: four
simple examples”
• [20] 2017, Kristian Iliev, etc. “Top 6 Advanced Obfuscation Techniques
Hiding Malware on Your Device”
• [21] 2017, Hyrum S. Anderson, etc. “Evading Machine Learning Malware
Detection”