SlideShare a Scribd company logo
1 of 54
Symbolic executionOverview of work done by Dawson Engler’s group at Stanford (EGT/EXE/KLEE*) by Shauvik Roy Choudhary http://cc.gatech.edu/~shauvik Some slides adapted from the EXE and KLEE presentations + slides from Saswat
Old research area but still active.. First introduced in 1975  (source: Saswat) 1976 by James King, IBM – TJ watson Very active area of research. Eg. EGT / EXE / KLEE [Stanford] DART [Bell Labs] CUTE [UIUC] SAGE, Pex [MSR Redmond] Vigilante [MSR Cambridge] BitScope [Berkeley/CMU] CatchConv [Berkeley] JPF [NASA Ames] 2
Symbolic Execution Symbolic execution refers to execution of program with symbols as argument. Unlike concrete execution, in symbolic execution the program can take any feasible path. (limitation: constraint solver) During symbolic execution, program state consists of symbolic values for some memory locations path condition Path condition is a conjuct of constraints on the symbolic input values. Solution of path-condition is an test-input that covers the respective path. 3
Implementation of Symbolic Execution Transformation approach transform the program to another program that operates on symbolic values such that execution of the transformed program is equivalent to symbolic execution of the original program difficult to implement, portable solution, suitable for Java, .NET Instrumentation approach callback hooks are inserted in the program such that symbolic execution is done in background during normal execution of program easy to implement for C Customized runtime approach Customize the runtime (e.g., JVM) to support symbolic execution Applicable to Java, .NET, difficult to implement, flexible, not portable 4 CUTE, KLEE JPF
Limitations of Symbolic Execution Limited by the power of constraint solver cannot handle non-linear and very complex constraints Does not scale when number of paths are large. (subject of ongoing research in this area) Source code, or equivalent (e.g., Java class files) is required for precise symbolic execution 5
EGT & EXE Slides based on D. Engler’s slides
Generic features:  Baroque interfaces, tricky input, rats nest of conditionals. Enormous undertaking to hit with manual testing. Random “fuzz” testing Charm: no manual work Blind generation makes hard					 to hit errors for narrow						 input range Also hard to hit errors that 						require structure This talk: a simple trick to finesse. Goal: find many bugs in systems code
EGT: Execution Generated Testing [SPIN’05] Basic Idea: Use the code itself to construct its input ! Basic Algorithm: Symbolic execution + constraints solving. Run code on symbolic inputs, initial value = “anything” As code observes inputs, it tells us values it can be. At conditionals that uses symbolic input, fork On true branch, add constraint that input satisfies check On false that it does not. Then generate constraints using these inputs and re-run code using them. 8 How to make system code crash itself !
The toy example Initialize x to be “any int” Code will run 3 times. Solve constraints at each    to get our 3 test cases. 9
The big picture Implementation prototype Do source-to-source transformation using CIL Use CVCL decision procedure to solve constraints, then re-run code on concrete values Robustness: use mixed symbolic and concrete execution 3 ways to look at what’s going on Grammar extraction Turn code inside out from input consumer to generator Sort-of Heisenberg effect: observations perturb symbolic inputs into increasingly concrete ones. More definite observation = more definite perturbation 10
Mixed execution Basic idea: given an operation: If all of it’s operands are concrete, just do it. If any are symbolic, add constraint. If current constraints are impossible, stop. If current path causes something to blow up, solve & emit. If current path calls unmodelled function, solve & call. If program exits, solve & emit. How to track? Use variable addresses to determine if symbolic or concrete Note: Symbolic assignment not destructive. Creates new symbol 11
Example transformation “+” Each varv has v.concrete and v.symbolic fields 	If v is concrete, symbol = <invalid> and vice versa 12
13
Results Mutt vs <= 1.4 have buffer overflow (osdi paper) Input size 4, took 34 minutes to generate 458 tests with 98% st coverage printf(3 implementations pintOS, gccfast, embedded) Made format strings symbolic Two bugs Incorrect grouping of integers  Incorrect handling of plus flags (“%” followed by space) 14
More.. WsMP3 server case study 2ooo LOC Technique: Make recv input symbolic Found known security hole + 2 new bugs 15 Network controlled infinite loop Buffer overflow
EXE: EXecution generated Executions [CCS’06] Same ideas as EGT Main contributions More practical tool:  Can test any code path Generates actual attacks Constraint Solver : STP Decision solver for bitvectors and arrays. If solvable, passes constraints to MiniSAT Four times lesser code than CVCL and magnitude faster Array optimizations (substitution, refinements, simplification) 16 Automatically Generating inputs of Death !
The mechanics User marks input to treat symbolically using either: Compile with EXE compiler, exe-cc.  Uses CIL to Insert checks around every expression: if operands all concrete, run as normal.  Otherwise, add as constraint Insert fork calls when symbolic could cause multiple acts ./a.out: forks at each decision point. When path terminates use STP to solve constraints. Terminates when: (1) exit, (2) crash, (3) EXE detects err Rerun concrete through uninstrumented code.
Isn’t exponential expensive? Only fork on symbolic branches. Most concrete (linear). Loops?  Heuristics. Default: DFS.  Linear processes with chain depth. Can get stuck. “Best first” search: chose branch, backtrack to point that will run code hit fewest times. Can do better… However: Happy to let run for weeks as long as generating interesting test cases.  Competition is manual and random.
Mixed execution Basic idea: given expression (e.g., deref, ALU op) If all of its operands are concrete, just do it. If any are symbolic, add as constraint. If current constraints are impossible, stop. If current path hits error or exit(), solve+emit. If calls uninstrumented code: do call, or solve and do call Example: “x = y + z” If y, z both concrete, execute.  Record x = concrete. Otherwise set “x = y + z”, record x =symbolic. Result: Most code runs concretely: small slice deals w/ symbolics. Robust: do not need all source code (e.g., OS).  Just run
Limits Missed constraints: If call asm, or CIL cannot eat file. STP cannot do div/mod: constraint to be power of 2, shift, mask respectively. Cannot handle **p where “p” is symbolic: must concretize *p.  (Note: **p still symbolic.) Stops path if cannot solve; can get lost in exponentials. Missing: No symbolic function pointers, symbolics passed to varargs not tracked. No floating point. long long support is erratic.
EXE Results Berkley Packet Filter Two buffer overflow exploits udhcpd – well tested user level DHCP server Five memory errors PCRE – Perl Compatible Regular Expressions Many out of bounds writes leading to abort in glibc on free Disks of death – File systems Four bugs on ext2 & ext 3 file systems. Null pointer dereference in JFS 21
A galactic view [Oakland’06]
KLEE Thanks to CristianCadar for the slides
24 Code complexity Tricky control flow Complex dependencies Abusive use of pointer operations Environmental dependencies Code has to anticipate all possible interactions Including malicious ones Writing Systems Code Is Hard
KLEE [OSDI 2008, Best Paper Award] Based on symbolic execution and constraint solving techniques Automatically generates high coverage test suites ,[object Object],Finds deep bugs in complex systems programs ,[object Object],25
Toy Example x=  x < 0 intbad_abs(intx)  {      if (x < 0) 	     return –x;      if (x == 1234)          return –x;      return x; } TRUE FALSE x0 x< 0 x = 1234 return -x TRUE FALSE x= 1234 x1234 x = -2 return x return -x test1.out x = 3 x = 1234 test2.out test3.out 26
KLEE Architecture LLVM bytecode C code L L V M x = -2 K L E E SYMBOLIC  ENVIRONMENT x = 1234 x = 3 x  0 x  1234 x = 3 Constraint Solver (STP) 27
Outline Motivation  Example and Basic Architecture Scalability Challenges Experimental Evaluation 28
Three Big Challenges Motivation  Example and Basic Architecture Scalability Challenges ,[object Object]
Expensive constraint solving
Interaction with environmentExperimental Evaluation 29
Exponential Search Space Naïve exploration can easily get “stuck” Use search heuristics: Coverage-optimized search ,[object Object]
Favor paths that recently hit new codeRandom path search ,[object Object],30
Three Big Challenges Motivation  Example and Basic Architecture Scalability Challenges ,[object Object]
Expensive constraint solving
Interaction with environmentExperimental Evaluation 31
Constraint Solving Dominates runtime ,[object Object]
Invoked at every branchTwo simple and effective optimizations ,[object Object]
Caching solutionsDramatic speedup on our benchmarks 32
Eliminating Irrelevant Constraints In practice, each branch usually depends on a small number of variables … … if (x < 10) {     … }                    x + y > 10 z & -z = z x< 10 ? 33
Caching Solutions Static set of branches: lots of similar constraint sets 2  y < 100 x > 3 x + y > 10 x = 5 y = 15 x = 5 y = 15 2  y < 100 x + y > 10 Eliminating constraints cannot invalidate solution 2  y < 100 x > 3 x + y > 10 x < 10 x = 5 y = 15 Adding constraints often  does not invalidate solution UBTree data structure [Hoffman and Koehler, IJCAI ’99] 34
Dramatic Speedup Aggregated data over 73 applications Time (s) Executed instructions (normalized) 35
Three Big Challenges Motivation  Example and Basic Architecture Scalability Challenges ,[object Object]
Expensive constraint solving
Interaction with environmentExperimental Evaluation 36
Environment: Calling Out Into OS intfd  = open(“t.txt”, O_RDONLY); If all arguments are concrete, forward to OS Otherwise, provide models that can handle symbolic files ,[object Object],intfd  = open(sym_str, O_RDONLY); 37
Environmental Modeling // actual implementation: ~50 LOC ssize_tread(intfd, void *buf, size_t count) { exe_file_t *f = get_file(fd);         … memcpy(buf, f->contents + f->off, count) f->off += count;         … } Plain C code run by KLEE ,[object Object],Currently: effective support for symbolic command line arguments, files, links, pipes, ttys, environment vars 38
Does KLEE work? Motivation  Example and Basic Architecture Scalability Challenges Evaluation ,[object Object]
Bug finding
Crosschecking39
GNU Coreutils Suite Core user-level apps installed on many UNIX systems 89 stand-alone (i.e. excluding wrappers) apps (v6.10) ,[object Object]
Management of system properties: hostname, printenv, etc.
Text file processing : sort, wc, od, etc.
…Variety of functions, different authors, intensive interaction with environment Heavily tested, mature code 40

More Related Content

What's hot

Static Code Analysis
Static Code AnalysisStatic Code Analysis
Static Code AnalysisAnnyce Davis
 
Unit testing framework
Unit testing frameworkUnit testing framework
Unit testing frameworkIgor Vavrish
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review ProcessDr. Syed Hassan Amin
 
Effective testing with pytest
Effective testing with pytestEffective testing with pytest
Effective testing with pytestHector Canto
 
In-Depth Model/View with QML
In-Depth Model/View with QMLIn-Depth Model/View with QML
In-Depth Model/View with QMLICS
 
오픈스택 커뮤니티 소개 및 기술 동향
오픈스택 커뮤니티 소개 및 기술 동향오픈스택 커뮤니티 소개 및 기술 동향
오픈스택 커뮤니티 소개 및 기술 동향Nalee Jang
 
Inter Process Communication (IPC) in Android
Inter Process Communication (IPC) in AndroidInter Process Communication (IPC) in Android
Inter Process Communication (IPC) in AndroidMalwinder Singh
 
6.applet programming in java
6.applet programming in java6.applet programming in java
6.applet programming in javaDeepak Sharma
 
[Webinar] Qt Test-Driven Development Using Google Test and Google Mock
[Webinar] Qt Test-Driven Development Using Google Test and Google Mock[Webinar] Qt Test-Driven Development Using Google Test and Google Mock
[Webinar] Qt Test-Driven Development Using Google Test and Google MockICS
 
An introduction to Google test framework
An introduction to Google test frameworkAn introduction to Google test framework
An introduction to Google test frameworkAbner Chih Yi Huang
 
Unit Testing Concepts and Best Practices
Unit Testing Concepts and Best PracticesUnit Testing Concepts and Best Practices
Unit Testing Concepts and Best PracticesDerek Smith
 
OO-like C Programming: Struct Inheritance and Virtual Function
OO-like C Programming: Struct Inheritance and Virtual FunctionOO-like C Programming: Struct Inheritance and Virtual Function
OO-like C Programming: Struct Inheritance and Virtual FunctionYu-Sheng (Yosen) Chen
 
Understanding Unit Testing
Understanding Unit TestingUnderstanding Unit Testing
Understanding Unit Testingikhwanhayat
 
Pairwise testing - Strategic test case design
Pairwise testing - Strategic test case designPairwise testing - Strategic test case design
Pairwise testing - Strategic test case designXBOSoft
 
소프트웨어 테스팅
소프트웨어 테스팅소프트웨어 테스팅
소프트웨어 테스팅영기 김
 
Mutation Testing
Mutation TestingMutation Testing
Mutation TestingESUG
 

What's hot (20)

Static Code Analysis
Static Code AnalysisStatic Code Analysis
Static Code Analysis
 
Unit testing framework
Unit testing frameworkUnit testing framework
Unit testing framework
 
Structures and Unions
Structures and UnionsStructures and Unions
Structures and Unions
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review Process
 
Effective testing with pytest
Effective testing with pytestEffective testing with pytest
Effective testing with pytest
 
In-Depth Model/View with QML
In-Depth Model/View with QMLIn-Depth Model/View with QML
In-Depth Model/View with QML
 
오픈스택 커뮤니티 소개 및 기술 동향
오픈스택 커뮤니티 소개 및 기술 동향오픈스택 커뮤니티 소개 및 기술 동향
오픈스택 커뮤니티 소개 및 기술 동향
 
Inter Process Communication (IPC) in Android
Inter Process Communication (IPC) in AndroidInter Process Communication (IPC) in Android
Inter Process Communication (IPC) in Android
 
6.applet programming in java
6.applet programming in java6.applet programming in java
6.applet programming in java
 
02 data types in java
02 data types in java02 data types in java
02 data types in java
 
[Webinar] Qt Test-Driven Development Using Google Test and Google Mock
[Webinar] Qt Test-Driven Development Using Google Test and Google Mock[Webinar] Qt Test-Driven Development Using Google Test and Google Mock
[Webinar] Qt Test-Driven Development Using Google Test and Google Mock
 
An introduction to Google test framework
An introduction to Google test frameworkAn introduction to Google test framework
An introduction to Google test framework
 
Unit Testing Concepts and Best Practices
Unit Testing Concepts and Best PracticesUnit Testing Concepts and Best Practices
Unit Testing Concepts and Best Practices
 
Introduction to Makefile
Introduction to MakefileIntroduction to Makefile
Introduction to Makefile
 
OO-like C Programming: Struct Inheritance and Virtual Function
OO-like C Programming: Struct Inheritance and Virtual FunctionOO-like C Programming: Struct Inheritance and Virtual Function
OO-like C Programming: Struct Inheritance and Virtual Function
 
Understanding Unit Testing
Understanding Unit TestingUnderstanding Unit Testing
Understanding Unit Testing
 
Pairwise testing - Strategic test case design
Pairwise testing - Strategic test case designPairwise testing - Strategic test case design
Pairwise testing - Strategic test case design
 
소프트웨어 테스팅
소프트웨어 테스팅소프트웨어 테스팅
소프트웨어 테스팅
 
Mutation Testing
Mutation TestingMutation Testing
Mutation Testing
 
C# in depth
C# in depthC# in depth
C# in depth
 

Similar to Symbolic Execution And KLEE

Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Emilio Coppa
 
DARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
DARPA CGC and DEFCON CTF: Automatic Attack and Defense TechniqueDARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
DARPA CGC and DEFCON CTF: Automatic Attack and Defense TechniqueChong-Kuan Chen
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Benoit Combemale
 
Assessing Unit Test Quality
Assessing Unit Test QualityAssessing Unit Test Quality
Assessing Unit Test Qualityguest268ee8
 
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...Raffi Khatchadourian
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency ConstructsTed Leung
 
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.Wolfgang Grieskamp
 
Software Testing Tecniques
Software Testing TecniquesSoftware Testing Tecniques
Software Testing Tecniquesersanbilik
 
OCL'16 slides: Models from Code or Code as a Model?
OCL'16 slides: Models from Code or Code as a Model?OCL'16 slides: Models from Code or Code as a Model?
OCL'16 slides: Models from Code or Code as a Model?Antonio García-Domínguez
 
Qat09 presentations dxw07u
Qat09 presentations dxw07uQat09 presentations dxw07u
Qat09 presentations dxw07uShubham Sharma
 
香港六合彩 » SlideShare
香港六合彩 » SlideShare香港六合彩 » SlideShare
香港六合彩 » SlideShareyayao
 
Elixir and elm
Elixir and elmElixir and elm
Elixir and elmMix & Go
 
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020Andrzej Jóźwiak
 
Custom Detectors for FindBugs (London Java Community Unconference 2)
Custom Detectors for FindBugs (London Java Community Unconference 2)Custom Detectors for FindBugs (London Java Community Unconference 2)
Custom Detectors for FindBugs (London Java Community Unconference 2)Robin Fernandes
 
White-box Unit Test Generation with Microsoft IntelliTest
White-box Unit Test Generation with Microsoft IntelliTestWhite-box Unit Test Generation with Microsoft IntelliTest
White-box Unit Test Generation with Microsoft IntelliTestDávid Honfi
 

Similar to Symbolic Execution And KLEE (20)

Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)
 
DARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
DARPA CGC and DEFCON CTF: Automatic Attack and Defense TechniqueDARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
DARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
 
Assessing Unit Test Quality
Assessing Unit Test QualityAssessing Unit Test Quality
Assessing Unit Test Quality
 
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency Constructs
 
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
 
Software Testing Tecniques
Software Testing TecniquesSoftware Testing Tecniques
Software Testing Tecniques
 
OCL'16 slides: Models from Code or Code as a Model?
OCL'16 slides: Models from Code or Code as a Model?OCL'16 slides: Models from Code or Code as a Model?
OCL'16 slides: Models from Code or Code as a Model?
 
Cgc2
Cgc2Cgc2
Cgc2
 
Unit testing - A&BP CC
Unit testing - A&BP CCUnit testing - A&BP CC
Unit testing - A&BP CC
 
Qat09 presentations dxw07u
Qat09 presentations dxw07uQat09 presentations dxw07u
Qat09 presentations dxw07u
 
Code Metrics
Code MetricsCode Metrics
Code Metrics
 
香港六合彩 » SlideShare
香港六合彩 » SlideShare香港六合彩 » SlideShare
香港六合彩 » SlideShare
 
Elixir and elm
Elixir and elmElixir and elm
Elixir and elm
 
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020
 
Tdd is not about testing
Tdd is not about testingTdd is not about testing
Tdd is not about testing
 
Custom Detectors for FindBugs (London Java Community Unconference 2)
Custom Detectors for FindBugs (London Java Community Unconference 2)Custom Detectors for FindBugs (London Java Community Unconference 2)
Custom Detectors for FindBugs (London Java Community Unconference 2)
 
White-box Unit Test Generation with Microsoft IntelliTest
White-box Unit Test Generation with Microsoft IntelliTestWhite-box Unit Test Generation with Microsoft IntelliTest
White-box Unit Test Generation with Microsoft IntelliTest
 

More from Shauvik Roy Choudhary, Ph.D. (10)

Test and docs: Hand in hand
Test and docs: Hand in handTest and docs: Hand in hand
Test and docs: Hand in hand
 
Using Robots for App Testing
Using Robots for App Testing Using Robots for App Testing
Using Robots for App Testing
 
From Manual to Automated Tests - STAC 2015
From Manual to Automated Tests - STAC 2015From Manual to Automated Tests - STAC 2015
From Manual to Automated Tests - STAC 2015
 
PhD Dissertation Defense (April 2015)
PhD Dissertation Defense (April 2015)PhD Dissertation Defense (April 2015)
PhD Dissertation Defense (April 2015)
 
Espresso Barista
Espresso BaristaEspresso Barista
Espresso Barista
 
CheckDroid Startup Madness 2014
CheckDroid Startup Madness 2014CheckDroid Startup Madness 2014
CheckDroid Startup Madness 2014
 
Penetration Testing with Improved Input Vector Identification
Penetration Testing with Improved Input Vector IdentificationPenetration Testing with Improved Input Vector Identification
Penetration Testing with Improved Input Vector Identification
 
Auto web
Auto webAuto web
Auto web
 
Intro to Html 5
Intro to Html 5Intro to Html 5
Intro to Html 5
 
Barcamp Atlanta 2007
Barcamp Atlanta 2007Barcamp Atlanta 2007
Barcamp Atlanta 2007
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

Symbolic Execution And KLEE

  • 1. Symbolic executionOverview of work done by Dawson Engler’s group at Stanford (EGT/EXE/KLEE*) by Shauvik Roy Choudhary http://cc.gatech.edu/~shauvik Some slides adapted from the EXE and KLEE presentations + slides from Saswat
  • 2. Old research area but still active.. First introduced in 1975 (source: Saswat) 1976 by James King, IBM – TJ watson Very active area of research. Eg. EGT / EXE / KLEE [Stanford] DART [Bell Labs] CUTE [UIUC] SAGE, Pex [MSR Redmond] Vigilante [MSR Cambridge] BitScope [Berkeley/CMU] CatchConv [Berkeley] JPF [NASA Ames] 2
  • 3. Symbolic Execution Symbolic execution refers to execution of program with symbols as argument. Unlike concrete execution, in symbolic execution the program can take any feasible path. (limitation: constraint solver) During symbolic execution, program state consists of symbolic values for some memory locations path condition Path condition is a conjuct of constraints on the symbolic input values. Solution of path-condition is an test-input that covers the respective path. 3
  • 4. Implementation of Symbolic Execution Transformation approach transform the program to another program that operates on symbolic values such that execution of the transformed program is equivalent to symbolic execution of the original program difficult to implement, portable solution, suitable for Java, .NET Instrumentation approach callback hooks are inserted in the program such that symbolic execution is done in background during normal execution of program easy to implement for C Customized runtime approach Customize the runtime (e.g., JVM) to support symbolic execution Applicable to Java, .NET, difficult to implement, flexible, not portable 4 CUTE, KLEE JPF
  • 5. Limitations of Symbolic Execution Limited by the power of constraint solver cannot handle non-linear and very complex constraints Does not scale when number of paths are large. (subject of ongoing research in this area) Source code, or equivalent (e.g., Java class files) is required for precise symbolic execution 5
  • 6. EGT & EXE Slides based on D. Engler’s slides
  • 7. Generic features: Baroque interfaces, tricky input, rats nest of conditionals. Enormous undertaking to hit with manual testing. Random “fuzz” testing Charm: no manual work Blind generation makes hard to hit errors for narrow input range Also hard to hit errors that require structure This talk: a simple trick to finesse. Goal: find many bugs in systems code
  • 8. EGT: Execution Generated Testing [SPIN’05] Basic Idea: Use the code itself to construct its input ! Basic Algorithm: Symbolic execution + constraints solving. Run code on symbolic inputs, initial value = “anything” As code observes inputs, it tells us values it can be. At conditionals that uses symbolic input, fork On true branch, add constraint that input satisfies check On false that it does not. Then generate constraints using these inputs and re-run code using them. 8 How to make system code crash itself !
  • 9. The toy example Initialize x to be “any int” Code will run 3 times. Solve constraints at each to get our 3 test cases. 9
  • 10. The big picture Implementation prototype Do source-to-source transformation using CIL Use CVCL decision procedure to solve constraints, then re-run code on concrete values Robustness: use mixed symbolic and concrete execution 3 ways to look at what’s going on Grammar extraction Turn code inside out from input consumer to generator Sort-of Heisenberg effect: observations perturb symbolic inputs into increasingly concrete ones. More definite observation = more definite perturbation 10
  • 11. Mixed execution Basic idea: given an operation: If all of it’s operands are concrete, just do it. If any are symbolic, add constraint. If current constraints are impossible, stop. If current path causes something to blow up, solve & emit. If current path calls unmodelled function, solve & call. If program exits, solve & emit. How to track? Use variable addresses to determine if symbolic or concrete Note: Symbolic assignment not destructive. Creates new symbol 11
  • 12. Example transformation “+” Each varv has v.concrete and v.symbolic fields If v is concrete, symbol = <invalid> and vice versa 12
  • 13. 13
  • 14. Results Mutt vs <= 1.4 have buffer overflow (osdi paper) Input size 4, took 34 minutes to generate 458 tests with 98% st coverage printf(3 implementations pintOS, gccfast, embedded) Made format strings symbolic Two bugs Incorrect grouping of integers Incorrect handling of plus flags (“%” followed by space) 14
  • 15. More.. WsMP3 server case study 2ooo LOC Technique: Make recv input symbolic Found known security hole + 2 new bugs 15 Network controlled infinite loop Buffer overflow
  • 16. EXE: EXecution generated Executions [CCS’06] Same ideas as EGT Main contributions More practical tool: Can test any code path Generates actual attacks Constraint Solver : STP Decision solver for bitvectors and arrays. If solvable, passes constraints to MiniSAT Four times lesser code than CVCL and magnitude faster Array optimizations (substitution, refinements, simplification) 16 Automatically Generating inputs of Death !
  • 17. The mechanics User marks input to treat symbolically using either: Compile with EXE compiler, exe-cc. Uses CIL to Insert checks around every expression: if operands all concrete, run as normal. Otherwise, add as constraint Insert fork calls when symbolic could cause multiple acts ./a.out: forks at each decision point. When path terminates use STP to solve constraints. Terminates when: (1) exit, (2) crash, (3) EXE detects err Rerun concrete through uninstrumented code.
  • 18. Isn’t exponential expensive? Only fork on symbolic branches. Most concrete (linear). Loops? Heuristics. Default: DFS. Linear processes with chain depth. Can get stuck. “Best first” search: chose branch, backtrack to point that will run code hit fewest times. Can do better… However: Happy to let run for weeks as long as generating interesting test cases. Competition is manual and random.
  • 19. Mixed execution Basic idea: given expression (e.g., deref, ALU op) If all of its operands are concrete, just do it. If any are symbolic, add as constraint. If current constraints are impossible, stop. If current path hits error or exit(), solve+emit. If calls uninstrumented code: do call, or solve and do call Example: “x = y + z” If y, z both concrete, execute. Record x = concrete. Otherwise set “x = y + z”, record x =symbolic. Result: Most code runs concretely: small slice deals w/ symbolics. Robust: do not need all source code (e.g., OS). Just run
  • 20. Limits Missed constraints: If call asm, or CIL cannot eat file. STP cannot do div/mod: constraint to be power of 2, shift, mask respectively. Cannot handle **p where “p” is symbolic: must concretize *p. (Note: **p still symbolic.) Stops path if cannot solve; can get lost in exponentials. Missing: No symbolic function pointers, symbolics passed to varargs not tracked. No floating point. long long support is erratic.
  • 21. EXE Results Berkley Packet Filter Two buffer overflow exploits udhcpd – well tested user level DHCP server Five memory errors PCRE – Perl Compatible Regular Expressions Many out of bounds writes leading to abort in glibc on free Disks of death – File systems Four bugs on ext2 & ext 3 file systems. Null pointer dereference in JFS 21
  • 22. A galactic view [Oakland’06]
  • 23. KLEE Thanks to CristianCadar for the slides
  • 24. 24 Code complexity Tricky control flow Complex dependencies Abusive use of pointer operations Environmental dependencies Code has to anticipate all possible interactions Including malicious ones Writing Systems Code Is Hard
  • 25.
  • 26. Toy Example x=  x < 0 intbad_abs(intx) { if (x < 0) return –x; if (x == 1234) return –x; return x; } TRUE FALSE x0 x< 0 x = 1234 return -x TRUE FALSE x= 1234 x1234 x = -2 return x return -x test1.out x = 3 x = 1234 test2.out test3.out 26
  • 27. KLEE Architecture LLVM bytecode C code L L V M x = -2 K L E E SYMBOLIC ENVIRONMENT x = 1234 x = 3 x  0 x  1234 x = 3 Constraint Solver (STP) 27
  • 28. Outline Motivation Example and Basic Architecture Scalability Challenges Experimental Evaluation 28
  • 29.
  • 32.
  • 33.
  • 34.
  • 37.
  • 38.
  • 39. Caching solutionsDramatic speedup on our benchmarks 32
  • 40. Eliminating Irrelevant Constraints In practice, each branch usually depends on a small number of variables … … if (x < 10) { … } x + y > 10 z & -z = z x< 10 ? 33
  • 41. Caching Solutions Static set of branches: lots of similar constraint sets 2  y < 100 x > 3 x + y > 10 x = 5 y = 15 x = 5 y = 15 2  y < 100 x + y > 10 Eliminating constraints cannot invalidate solution 2  y < 100 x > 3 x + y > 10 x < 10 x = 5 y = 15 Adding constraints often does not invalidate solution UBTree data structure [Hoffman and Koehler, IJCAI ’99] 34
  • 42. Dramatic Speedup Aggregated data over 73 applications Time (s) Executed instructions (normalized) 35
  • 43.
  • 46.
  • 47.
  • 48.
  • 51.
  • 52. Management of system properties: hostname, printenv, etc.
  • 53. Text file processing : sort, wc, od, etc.
  • 54. …Variety of functions, different authors, intensive interaction with environment Heavily tested, mature code 40
  • 55. Coreutils ELOC (incl. called lib) Number of applications Executable Lines of Code (ELOC) 41
  • 56.
  • 57. High Line Coverage (Coreutils, non-lib, 1h/utility = 89 h) Overall: 84%, Average 91%, Median 95% 16 at 100% Coverage (ELOC %) Apps sorted by KLEE coverage 43
  • 58. KLEE 91% Manual 68% Beats 15 Years of Manual Testing Avg/utility Manual tests also check correctness KLEE coverage – Manual coverage Apps sorted by KLEE coverage – Manual coverage 44
  • 59. Busybox Suite for Embedded Devices Overall: 91%, Average 94%, Median 98% 31 at 100% Coverage (ELOC %) Apps sorted by KLEE coverage 45
  • 60. KLEE 94% Manual 44% Busybox – KLEE vs. Manual Avg/utility KLEE coverage – Manual coverage Apps sorted by KLEE coverage – Manual coverage 46
  • 61.
  • 64.
  • 65. KLEE generates actual command lines exposing crashes48
  • 66. md5sum -c t1.txt mkdir -Z a b mkfifo -Z a b mknod -Z a b p seq -f %0 1 pr -e t2.txt tac -r t3.txt t3.txt paste -d abcdefghijklmnopqrstuvwxyz ptx -F abcdefghijklmnopqrstuvwxyz ptx x t4.txt t1.txt: MD5( t2.txt: t3.txt: t4.txt: A Ten command lines of death 49
  • 67.
  • 70.
  • 71. An assert is just a branch, and KLEE proves feasibility/infeasibility of each branch it reaches
  • 72. If KLEE determines infeasibility of false side of assert, the assert was proven on the current path51
  • 73. Crosschecking Assume f(x) and f’(x) implement the same interface Make input x symbolic Run KLEE on assert(f(x) == f’(x)) For each explored path: KLEE terminates w/o error: paths are equivalent KLEE terminates w/ error: mismatch found Coreutils vs. Busybox: UNIX utilities should conform to IEEE Std.1003.1 Crosschecked pairs of Coreutils and Busybox apps Verified paths, found mismatches 52
  • 74. Input Busybox Coreutils tee "" <t1.txt [infinite loop] [terminates] tee - [copies once to stdout] [copies twice] comm t1.txt t2.txt [doesn’t show diff] [shows diff] cksum / "4294967295 0 /" "/: Is a directory" split / "/: Is a directory" tr [duplicates input] "missing operand" [ 0 ‘‘<’’ 1 ] "binary op. expected" tail –2l [rejects] [accepts] unexpand –f [accepts] [rejects] split – [rejects] [accepts] t1.txt: a t2.txt: b (no newlines!) Mismatches Found 53
  • 75.
  • 76. KLEE DEMO Tool available at http://klee.llvm.org/ Experiments Tool examples isLower() RegExp More experimentation 55
  • 77. Discussion Questions / Ideas ? Thanks for listening !