One of the most prevalent methods used by attackers to exploit vulnerabilities is ROP - Return Oriented Programming. Many times during the exploitation process, code will run very differently than it does usually - calls will be made to the middle of functions, functions won’t return to their callers, etc. These anomalies in control flow could be detected if a log of all instructions executed by the processor were available.
In the past, tracing the execution of a processor incurred a significant slowdown, rendering such an anti-exploitation method impractical. However, recent Intel processors, such as Broadwell and Skylake, are now able to trace execution with low overhead, via a feature called Processor Trace. A similar feature called CoreSight exists on new ARM processors.
The lecture will discuss an anti-exploitation system we built which scans files and detects control flow violations by using these new processor features.
--- Ron Shina
Ron has been staring at binary code for over the past decade, occasionally running it. Having spent a lot of his time doing mathematics, he enjoys searching for algorithmic opportunities in security research and reverse engineering. He is a graduate of the Israel Defense Forces’ Talpiot program. In his spare time he works on his jump shot.
--- Shlomi Oberman
Shlomi Oberman is an independent security researcher with over a decade of experience in security research. Shlomi spent many years in the attacker’s shoes for different companies and knows too well how hard it is to stop a determined attacker. In the past years his interest has shifted from breaking things to helping stop exploits – while software is written and after it has shipped. Shlomi is a veteran of the IDF Intelligence Corps and used to head the security research efforts at NSO Group and other companies.
2. Brought to you by
Shlomi Oberman
independent security researcher
Ron Shina
independent security researcher
3. Tracing – what executed and
when?
Code optimization and profiling
◦ Sampling
◦ Instrumentation
Intel Processor Trace (PT)
4. Intel PT
Processor feature enabling instruction tracing with
low overhead – documentation says about 5%
◦ Tens of times faster than the previous option
Available on Intel Broadwell and Skylake processors
A similar feature, Real Time Instruction Trace, exists
on certain Intel Atom processors
6. Packets
Processor writes trace to memory as packets
Packet Types
◦ Taken / Not Taken packets for conditional branches
◦ IP packets for indirect branches
◦ Timestamp packets
◦ …
Binary is needed to recreate the instruction trace
8. User and or Kernel tracing
Filter by process
Starting or stopping the trace based on address
ranges (only in later processors)
Configuration options
9. Atom processors supporting RTIT – tracing guests
possible, but not the hypervisor
Broadwell – no support at all
Skylake – full support
Tracing VM guests and hypervisors
11. Linux kernel 4.1 comes with integrated PT support
Linux kernel 4.3 supports tracing using perf user tools
An open source PT decoding library – libipt
Gdb 7.10 supports using PT for tracing
simple-pt – an open source implementation of PT on Linux
(used to create the trace pictures on the previous slide)
* processor supporting PT included separately ;)
Want to use Processor Trace right now? *
12. Exploitation and the NX Bit
pdf
Hi!
shellcode
When pdf is opened, the shellcode will be
in memory that isn’t executable – NX bit
How do attackers run the code to make
their shellcode executable?
◦ Use code that is already executable (the
program’s code )
This exploitation technique comes in
many forms, most notably, ROP – Return
Oriented Programming
13. Using executable memory already in the
program usually involves moving
around the process rather strangely
for example:
◦ Not returning to a function’s caller
◦ Calling addresses in the middle of functions,
instead of at the beginning
◦ …
“Jump Around, Jump around…” / House of Pain
pdf
Hi!
shellcode
14. Establish rules for how the code flows in the process
◦ Functions return to their callers
◦ Calls are made to the beginning of functions
◦ …
How can those rules be enforced?
◦ Add rule checking to the program’s binary
◦ Trace the program while running and go over the log (this work)
◦ Use other CPU features to detect “surprising” branches
“Control Flow Integrity Principles, Implementations, and Applications”, Abadi,
Budiu, Erlingsson, Ligatti, 2005
Control Flow Integrity (CFI)
15. “Security Breaches as PMU Deviation”, Yuan, Xing, Chen, Zang 2011
“kBouncer: Efficient and Transparent ROP Mitigation” – Pappas, Winner of Microsoft
BlueHat competition 2012, uses previous CPU branch tracing capabilities
“CFIMon: Detecting Violation of Control Flow Integrity using Performance Counters” –
Xia, Liu, Chen, Zang 2012
“Taming ROP on Sandy Bridge”, Wicherski of Crowdstrike, 2013
“Transparent ROP Detection using CPU Performance Counters”, Li, Crouse, THREADS
2014
and more…
Prior Work
16. Anti exploitation system to scan files based on CFI
(think pdf on Adobe Reader)
Detects whether “illegal” returns were made, like in
ROP
◦ Easy to add other CFI mitigations, such as checking the
targets of calls (no calls to the middle of functions, …)
(Soon to be) Open Source
Developed in 2015
Our Implementation
17. Verifying CFI via Processor Trace
Was the flow OK?
Just follow the arrows and
calls using the PT generated
packets
18. What information is needed to follow the
execution and verify it?
Control Flow Graph (CFG)
◦ Location of functions
◦ Location of basic blocks
◦ …
Need this for all the libraries loaded
by the process – Adobe Reader dlls,
Windows dlls
◦ If not – false positives
All we have is debugging symbols,
pdb files, for the Windows binaries
19. We used IDA to recover the CFG
IDA didn’t do a good enough job
◦ Part of the functions and basic blocks in Adobe Reader
/ Windows binaries weren’t detected
Static Analysis
20. When supporting a new version of Adobe Reader, IDA is used
to get the initial CFG (static analysis)
Afterwards, many pdf files are traced with PT
◦ When a new basic block or function is discovered while following the
trace – the CFG is updated
Repeat
◦ run IDA on the new CFG
◦ run the pdf files on IDA’s output
◦ If the CFG was updated in the last iteration
Repeat
Dynamic Analysis
21. Most of the edges in the CFG are:
◦ Calls relative to the current IP (no
packet for those)
◦ Conditional branches
When traversing the CFG during
trace verification, fetching the next
node in these cases has to be (very)
fast
Since the CFG is fixed and built in
preprocessing, this isn’t a problem
Optimization
22. Ideally, no disassembly and CFG modification (slow)
would be done during verification
However, some of the code analyzed is created
dynamically – as long as it doesn’t change, this can be
dealt with in preprocessing
In cases where it changes every time “Adobe Reader” is
run to open a file, preprocessing isn’t enough
◦ code is disassembled and CFG is updated
Optimization
23. Following the execution trace is done on a per
thread basis
How to know which thread was executing at
each part of the trace?
◦ PT packets give timing information, but only
output the current process
Thread information
24. Event Tracing for Windows (ETW)
◦ It should be possible to get the thread context switching
times from the CSwitch events provided by ETW as
TSC
◦ Then these timestamps could be synched with the TSC
packets from PT to determine which thread was
running in different parts of the trace
Thread Information
25. What about getting a callback every time a thread in the
traced process is switched in?
◦ AFAWK, no direct way
◦ We hooked the Windows context switch function - don’t do that
◦ Endgame presented a way to achieve this via Asynchronous
Procedure Calls (Blackhat 2016)
Thread Information
26. Need to know the executable memory ranges at all
points in the trace – what modules are loaded
Knowing when the PT trace reached ntdll!LdrLoadDll
and ntdll!LdrUnloadDll isn’t enough
◦ Module name is needed to update the current memory map
ETW was used to retrieve module load / unload name
and time (tsc) and this is then synched with the times
of the load/unload functions in the trace
Module load / unload
27. For example:
◦ Exception dispatching code
◦ User mode callbacks
◦ …
When going over the trace, when suspected mismatches
occur, the above special cases are checked via binary
signatures
This mostly needs to be done per operating system, not
per-application
Still not done – functions don’t always return
to their callers
28. (almost entirely) Not dealt with by our implementation
For PT tracing the code being executed is needed
One obvious problem is pages that get written to and
executed from simultaneously
(maybe) One could remove the write permission every
time a page becomes writable and executable and handle
the access violation when it gets written to, in order to
obtain the code’s new version
Dynamically generated code
29. A case of dynamically generated code that was dealt with:
Applications that hook themselves… with identical
hooks, at the same locations and same time
To the trace verifier, the code is essentially static
Dynamically generated code
30. Benign, non malicious files
◦ Run on 10000 pdf, 3000 ppt/x, 3000 doc/x
without false positives
Malicious files containing a ROP chain
◦ Run on 5 such files, detecting the exploit and
displaying the CFI violation
Scanning Results
31. you’d still need
◦ Module load / unload information
◦ Thread context switch times
but could somewhat do without
◦ The CFG – a partial CFG can be built from the trace (it
doesn’t need to be built in advance)
Forget CFI and anti-exploitation…
What if I just want to trace a process quickly
with Processor Trace?
32. Control-flow Enforcement Technology announced
by Intel June 2016. Release date ?
Processors will directly support:
◦ Shadow (call) Stack tracking –unmatching return
control protection exception
◦ Indirect branch tracking – an indirect branch to a
target containing an instruction different than
ENDBRANCH control protection fault
Coming soon to a motherboard near you
33. ARM has a feature similar to Processor Trace called
CoreSight
Tracing on linux has been integrated with perf
Open source decoding library exists – OpenCSD
http://www.linaro.org/blog/core-dump/coresight-
perf-and-the-opencsd-library/
What about tracing quickly on ARM?
34. “Control Jujutsu” – Evans, Long, Otogonbaatar, Shrobe, Rinard,
Okhravi, Stelios, CCS 2015
Uses indirect call sites with controllable targets and
arguments (via vulnerability) to achieve arbitrary code
execution (e.g., call exec or system)
Bypasses CFI because the target functions are legal in the
CFG
Bypassing CFI
35. “Write Once, Pwn Anywhere”, Yu, Black Hat USA 2014
◦ Sometimes applications have security critical
information in one variable
◦ Pseudo-code from internet explorer’s javascript engine:
if (safemode & 0xB == 0) {
turn_on_god_mode();
}
Bypassing CFI with “data attacks”
36. “Control Flow Bending”, Carlini, Barresi, Payer, Wagner, Gross,
USENIX 2015
◦ printf-oriented-programming – if you control the
arguments, printf can do arbitrary computation
Bypassing CFI with “data attacks”
37. “Data oriented programming” – Hu, Shinde, Sendroiu,
Zheng, Prateek , Zhenkai, S&P 2016
goal: perform arbitrary computation while adhering
to the CFG
Similar to ROP in spirit – use parts of the original
program as “instructions” of a “VM” controlled by
the attacker
“data gadgets” are used to perform computation on
data
Bypassing CFI with “data attacks”
38. gadgets are executed one after the other by using
constructs already in the vulnerable program – such
as loops
the vulnerability being exploited is used to determine
which data gadget gets run and on what data
“data oriented programming” (cont)