[CB16] COFI break – Breaking exploits with Processor trace and Practical control flow integrity by Ron Shina & Shlomi Oberman

Anti exploitation and Control Flow
Integrity with Processor Trace

Brought to you by
Shlomi Oberman
independent security researcher
Ron Shina
independent security researcher

 Tracing – what executed and
when?
 Code optimization and profiling
◦ Sampling
◦ Instrumentation
Intel Processor Trace (PT)

Intel PT
 Processor feature enabling instruction tracing with
low overhead – documentation says about 5%
◦ Tens of times faster than the previous option
 Available on Intel Broadwell and Skylake processors
 A similar feature, Real Time Instruction Trace, exists
on certain Intel Atom processors

Packets
 Processor writes trace to memory as packets
 Packet Types
◦ Taken / Not Taken packets for conditional branches
◦ IP packets for indirect branches
◦ Timestamp packets
◦ …
 Binary is needed to recreate the instruction trace

call to foo
branch taken / not taken
Decoded Trace Packets

 User and or Kernel tracing
 Filter by process
 Starting or stopping the trace based on address
ranges (only in later processors)
Configuration options

 Atom processors supporting RTIT – tracing guests
possible, but not the hypervisor
 Broadwell – no support at all
 Skylake – full support
Tracing VM guests and hypervisors

+ Traced Program’s
Binary
Instruction Trace
Intel PT output

 Linux kernel 4.1 comes with integrated PT support
 Linux kernel 4.3 supports tracing using perf user tools
 An open source PT decoding library – libipt
 Gdb 7.10 supports using PT for tracing
 simple-pt – an open source implementation of PT on Linux
(used to create the trace pictures on the previous slide)
* processor supporting PT included separately ;)
Want to use Processor Trace right now? *

Exploitation and the NX Bit
pdf
Hi!
shellcode
 When pdf is opened, the shellcode will be
in memory that isn’t executable – NX bit
 How do attackers run the code to make
their shellcode executable?
◦ Use code that is already executable (the
program’s code )
 This exploitation technique comes in
many forms, most notably, ROP – Return
Oriented Programming

 Using executable memory already in the
program usually involves moving
around the process rather strangely 
for example:
◦ Not returning to a function’s caller
◦ Calling addresses in the middle of functions,
instead of at the beginning
◦ …
“Jump Around, Jump around…” / House of Pain
pdf
Hi!
shellcode

 Establish rules for how the code flows in the process
◦ Functions return to their callers
◦ Calls are made to the beginning of functions
◦ …
 How can those rules be enforced?
◦ Add rule checking to the program’s binary
◦ Trace the program while running and go over the log (this work)
◦ Use other CPU features to detect “surprising” branches
“Control Flow Integrity Principles, Implementations, and Applications”, Abadi,
Budiu, Erlingsson, Ligatti, 2005
Control Flow Integrity (CFI)

 “Security Breaches as PMU Deviation”, Yuan, Xing, Chen, Zang 2011
 “kBouncer: Efficient and Transparent ROP Mitigation” – Pappas, Winner of Microsoft
BlueHat competition 2012, uses previous CPU branch tracing capabilities
 “CFIMon: Detecting Violation of Control Flow Integrity using Performance Counters” –
Xia, Liu, Chen, Zang 2012
 “Taming ROP on Sandy Bridge”, Wicherski of Crowdstrike, 2013
 “Transparent ROP Detection using CPU Performance Counters”, Li, Crouse, THREADS
2014
 and more…
Prior Work

 Anti exploitation system to scan files based on CFI
(think pdf on Adobe Reader)
 Detects whether “illegal” returns were made, like in
ROP
◦ Easy to add other CFI mitigations, such as checking the
targets of calls (no calls to the middle of functions, …)
 (Soon to be) Open Source
 Developed in 2015
Our Implementation

Verifying CFI via Processor Trace
 Was the flow OK?
 Just follow the arrows and
calls using the PT generated
packets

What information is needed to follow the
execution and verify it?
 Control Flow Graph (CFG)
◦ Location of functions
◦ Location of basic blocks
◦ …
 Need this for all the libraries loaded
by the process – Adobe Reader dlls,
Windows dlls
◦ If not – false positives 
 All we have is debugging symbols,
pdb files, for the Windows binaries

 We used IDA to recover the CFG
 IDA didn’t do a good enough job
◦ Part of the functions and basic blocks in Adobe Reader
/ Windows binaries weren’t detected
Static Analysis

 When supporting a new version of Adobe Reader, IDA is used
to get the initial CFG (static analysis)
 Afterwards, many pdf files are traced with PT
◦ When a new basic block or function is discovered while following the
trace – the CFG is updated
 Repeat
◦ run IDA on the new CFG
◦ run the pdf files on IDA’s output
◦ If the CFG was updated in the last iteration
 Repeat 
Dynamic Analysis

 Most of the edges in the CFG are:
◦ Calls relative to the current IP (no
packet for those)
◦ Conditional branches
 When traversing the CFG during
trace verification, fetching the next
node in these cases has to be (very)
fast
 Since the CFG is fixed and built in
preprocessing, this isn’t a problem
Optimization

 Ideally, no disassembly and CFG modification (slow)
would be done during verification
 However, some of the code analyzed is created
dynamically – as long as it doesn’t change, this can be
dealt with in preprocessing
 In cases where it changes every time “Adobe Reader” is
run to open a file, preprocessing isn’t enough
◦ code is disassembled and CFG is updated
Optimization

 Following the execution trace is done on a per
thread basis
 How to know which thread was executing at
each part of the trace?
◦ PT packets give timing information, but only
output the current process
Thread information

 Event Tracing for Windows (ETW)
◦ It should be possible to get the thread context switching
times from the CSwitch events provided by ETW as
TSC
◦ Then these timestamps could be synched with the TSC
packets from PT to determine which thread was
running in different parts of the trace
Thread Information

 What about getting a callback every time a thread in the
traced process is switched in?
◦ AFAWK, no direct way
◦ We hooked the Windows context switch function - don’t do that
◦ Endgame presented a way to achieve this via Asynchronous
Procedure Calls (Blackhat 2016)
Thread Information

 Need to know the executable memory ranges at all
points in the trace – what modules are loaded
 Knowing when the PT trace reached ntdll!LdrLoadDll
and ntdll!LdrUnloadDll isn’t enough
◦ Module name is needed to update the current memory map
 ETW was used to retrieve module load / unload name
and time (tsc) and this is then synched with the times
of the load/unload functions in the trace
Module load / unload

 For example:
◦ Exception dispatching code
◦ User mode callbacks
◦ …
 When going over the trace, when suspected mismatches
occur, the above special cases are checked via binary
signatures
 This mostly needs to be done per operating system, not
per-application
Still not done – functions don’t always return
to their callers

 (almost entirely) Not dealt with by our implementation
 For PT tracing the code being executed is needed 
 One obvious problem is pages that get written to and
executed from simultaneously
 (maybe) One could remove the write permission every
time a page becomes writable and executable and handle
the access violation when it gets written to, in order to
obtain the code’s new version
Dynamically generated code

 A case of dynamically generated code that was dealt with:
 Applications that hook themselves… with identical
hooks, at the same locations and same time
 To the trace verifier, the code is essentially static
Dynamically generated code

 Benign, non malicious files
◦ Run on 10000 pdf, 3000 ppt/x, 3000 doc/x
without false positives
 Malicious files containing a ROP chain
◦ Run on 5 such files, detecting the exploit and
displaying the CFI violation
Scanning Results

 you’d still need
◦ Module load / unload information
◦ Thread context switch times
 but could somewhat do without
◦ The CFG – a partial CFG can be built from the trace (it
doesn’t need to be built in advance)
Forget CFI and anti-exploitation…
What if I just want to trace a process quickly
with Processor Trace?

 Control-flow Enforcement Technology announced
by Intel June 2016. Release date ?
 Processors will directly support:
◦ Shadow (call) Stack tracking –unmatching return
 control protection exception
◦ Indirect branch tracking – an indirect branch to a
target containing an instruction different than
ENDBRANCH  control protection fault
Coming soon to a motherboard near you

 ARM has a feature similar to Processor Trace called
CoreSight
 Tracing on linux has been integrated with perf
 Open source decoding library exists – OpenCSD
http://www.linaro.org/blog/core-dump/coresight-
perf-and-the-opencsd-library/
What about tracing quickly on ARM?

 “Control Jujutsu” – Evans, Long, Otogonbaatar, Shrobe, Rinard,
Okhravi, Stelios, CCS 2015
 Uses indirect call sites with controllable targets and
arguments (via vulnerability) to achieve arbitrary code
execution (e.g., call exec or system)
 Bypasses CFI because the target functions are legal in the
CFG
Bypassing CFI

 “Write Once, Pwn Anywhere”, Yu, Black Hat USA 2014
◦ Sometimes applications have security critical
information in one variable
◦ Pseudo-code from internet explorer’s javascript engine:
if (safemode & 0xB == 0) {
turn_on_god_mode();
}
Bypassing CFI with “data attacks”

 “Control Flow Bending”, Carlini, Barresi, Payer, Wagner, Gross,
USENIX 2015
◦ printf-oriented-programming – if you control the
arguments, printf can do arbitrary computation

 “Data oriented programming” – Hu, Shinde, Sendroiu,
Zheng, Prateek , Zhenkai, S&P 2016
 goal: perform arbitrary computation while adhering
to the CFG
 Similar to ROP in spirit – use parts of the original
program as “instructions” of a “VM” controlled by
the attacker
 “data gadgets” are used to perform computation on
data

 gadgets are executed one after the other by using
constructs already in the vulnerable program – such
as loops
 the vulnerability being exploited is used to determine
which data gadget gets run and on what data
“data oriented programming” (cont)

[CB16] COFI break – Breaking exploits with Processor trace and Practical control flow integrity by Ron Shina & Shlomi Oberman

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to [CB16] COFI break – Breaking exploits with Processor trace and Practical control flow integrity by Ron Shina & Shlomi Oberman

Similar to [CB16] COFI break – Breaking exploits with Processor trace and Practical control flow integrity by Ron Shina & Shlomi Oberman (20)

More from CODE BLUE

More from CODE BLUE (20)

Recently uploaded

Recently uploaded (20)

[CB16] COFI break – Breaking exploits with Processor trace and Practical control flow integrity by Ron Shina & Shlomi Oberman

Editor's Notes