3. Deep packet inspection (DPI)
no DPI
• packet header lookup
• route based on destination (unless
PBR)
• classify with static rules or state data
• cheap
DPI
• packet header and payload lookup
• may route based on content (e.g.
uplinks for priority and `bulky’ traffic)
• classify with static rules, state data,
multiple patterns and custom logic
• expensive?
3
4. 100+ Gbit DPI – why?
• end customers typically < 10G uplinks
– L7 filtering (WAF, IPS etc.) requested by enterprises
– multiple IDS, IPS, NGFW, UTM and WAFs on the market
– can be handled with open source tools
• 100G+ speeds: ISP/Telco/large DCs
– do not want to interfere with traffic
• unless hit by huge DDoS attack
• or kindly asked by local régime
4
5. Mirai botnet attacks – examples
• attack_tcp_stomp
– establish legal TCP connection, then flood it
– not to confuse with STOMP protocol
• attack_udp_dns
– DNS „water torture”, FQDN with random host
• attack_app_http
– HTTP request flood
• attack_app_cfnull
– HTTP POST junk
5
source: https://github.com/rosgos/Mirai-Source-Code
DPI may help
easy :)
6. Large DDoS attacks in 2016 – examples
1. 150M pps (650Gbps) of TCP SYN packets (mixed size), spoofed IPs
2. 1.75M rps peak of HTTP requests (~121B/r) from ~52k src IPs
3. 220k rps (360Gbps) of large HTTP requests from ~128k src IPs
4. ~1Tbps of recursive „water torture” DNS queries
sources:
• https://blog.cloudflare.com/say-cheese-a-snapshot-of-the-massive-ddos-attacks-coming-from-iot-cameras/
• https://www.incapsula.com/blog/650gbps-ddos-attack-leet-botnet.html
• http://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/
6
DPI may help
7. 100Gbit/s sizing
• ~148.8 Mpps in small frames, but no payload to scan
• ~8.127 Mpps in 1514B frames
• ~12.19 GB/s of IP payload
• given 16 core machine, our target is:
– ~0.5M – 2M lookups /s per core
– up to ~762 MB/s per core
– note: not all packets and not entire payloads have to be scanned
7
8. Payload lookup – position
• fixed
– e.g. NTP
• network protocol aware
– e.g. DNS
• application aware
– e.g. HTTP
• anywhere in the packet
– bad idea
$ strings /usr/bin/* | grep -c sex
93
8
9. Protocol design rant
"string: variable-length byte field, encoded in UTF-8, terminated by 0x00”
source: https://developer.valvesoftware.com/wiki/Server_queries
9
10. Software payload lookup – approaches
Method Example
fixed position literal matching (sequence) <you name it>
fixed position literal matching (trie) DPDK ACL
computed position literal matching tc u32
application aware classifier nDPI, netfilter l7-filter
application level gateway (ALG) netfilter nf_conntrack_*
programmable data path netfilter xt_bpf, nftables, XDP+eBPF
embedded scripting language NPFLua, pflua
hybrid with state machines Hyperscan, Tempesta FW
regexp engine Bro, Snort, Suricata
10
13. Finite–state machine
• abstract machine
• has states and transitions
• some states are "accept states"
• input updates machine state
• accepts and rejects input sequence
of symbols
sources:
• https://en.wikipedia.org/wiki/State_diagram
• https://en.wikipedia.org/wiki/Deterministic_finite_automaton
example: accepts binary strings with even number of zeroes
13
14. DFA vs. NFA
• Deterministic finite automaton (DFA)
– each of its transitions is uniquely determined by its source state and input
symbol
– reading an input symbol is required for each state transition.
• Nondeterministic finite automaton (NFA) otherwise
• NFA can be converted to DFA
– DFA is efficient to execute, but may grow
– NFA is easier to construct, but may be slower
tools:
• http://hackingoff.com/compilers/regular-expression-to-nfa-dfa
• http://ivanzuzak.info/noam/webapps/fsm_simulator/
14
15. PCRE vs. DFA and NFA
• PCRE (Perl Compatible Regular Expression) engine is powerful
• typical PCRE engine comes as NFA + backtracking
• DFA matches regular language (pure) thus can be used to match only
some of PCREs
• less features, faster engines!
– Hyperscan, https://01.org/hyperscan
– Perl Incompatible Regular Expressions, https://github.com/yandex/pire
15
16. Features considered harmful
• back-tracking (trial and error)
• back references 1
• lookarounds (lookahead, lookbehind) (?<!a)b
• conditional regexps (?(?=regex)then|else)
16
see also: http://www.regular-expressions.info
17. Case: catastrophic backtracking
• 34 min Stack Overflow outage in 2016
• s+$
• „malformed post contained roughly 20,000 consecutive
characters of whitespace on a comment line”
• O(n2)
• in other cases it may be 2n
sources:
• http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016
• http://www.regular-expressions.info/catastrophic.html
17
>>> sum(range(0,20001))
200010000
18. Sources
1. „Finite State Machine Parsing for Internet Protocols: Faster Than You Think”,
http://www.cs.dartmouth.edu/~pete/pubs/LangSec-2014-fsm-parsers.pdf
2. „100G Intrusion Detection”, http://go.lbl.gov/100g
3. „DotStar: Breaking the Scalability and Performance Barriers in Regular Expression Set Matching”,
http://domino.watson.ibm.com/library/cyberdig.nsf/papers/F38C0227DBF5C7E78525758C005BD05C/$File/rc24645.pdf
4. „Fast Regular Expression Matching Using Dual Glushkov NFA”,
https://www-alg.ist.hokudai.ac.jp/~thomas/TCSTR/tcstr_14_73/tcstr_14_73.pdf
5. PIRE discussion: https://news.ycombinator.com/item?id=10209775
18
20. What is Hyperscan?
• „high-performance multiple regex matching library”
• C (run-time, API) and C++ (compiler), BSD licensed
• runs on Intel CPUs only, uses:
– SIMD (Single Instruction, Multiple Data)
– BMI (Bit Manipulation Instruction Sets)
• „typically used in a DPI library stack”
20
21. Hyperscan history
• developed by Sensory Networks
• 2003-2008 hardware prototypes (GPGPU, FPGA), NodalCore C-series accelerators
• 2009 software-based Hyperscan created (note: hardware approach dead end)
• 2009-2015 evolution (commercial)
• 2015 acquired by Intel, released on BSD license
• 2017 v4.4 release
sources:
• https://01.org/hyperscan
• https://lists.01.org/pipermail/hyperscan/2017-January/000078.html
• "Hyperscan In SURICATA: STATE OF THE UNION"
21
23. How it works – regexp database
# pattern flags min offset max offset min length
0 ^foo
1 bar$
2 w+bazs{2} singlematch
3 d+ leftmost 5
4 loremnipsum dotall 10
n ^(all|your|base) caseless 15
23
database is a group of regexps and their settings, thousands of regexps possible
24. How it works – independent scanning contexts
24
regex
database
compiled
earlierinput core 0
matcher, local data (scratch)
input core n
matcher, local data (scratch)
25. How it works
• may return multiple matches
• by default, returns only end offset
• not greedy
• regexp expression parsed and split into:
– literals (fixed strings)
– DFA engines
– NFA engines
– custom engines (prefix, suffix, infix, outfix)
– not Aho-Corasick
• scanning mode – block, streaming, vectored
25
PCMPEQB (compare packed bytes in
xmm2/m128 and xmm1 for equality)
POPCNT (return the Count of Number
of Bits Set to 1)
26. DPDK ACL vs. Hyperscan regexp
DPDK ACL
• compiled to „ACL”
• fixed position pattern
• looks up all fields in the packet
• looks up multiple packets at once in
one ACL (up to 16 categories)
• predictable speed
• returns one match (highest priority) per
category
regexp as ACL1
• compiled to „DB”
• dynamic position pattern
• skip not relevant fields
• looks up one packet in DB (multiple
regexps at once)
• speed depends on input
• may return multiple matches
26
1 speculation, v4.5 is not released yet
27. Sources (Hyperscan)
1. http://01org.github.io/hyperscan/
2. http://www.slideshare.net/harryvanhaaren/hyperscan-mohammad-abdul-awal
3. „HYPERSCAN PERFORMANCE BENCHMARK ON INTEL XEON PROCESSORS, Delivering 160 Gbps DPI Throughput on the Intel
Xeon Processor E5-2600 Series”,
https://networkbuilders.intel.com/docs/1645-Hyperscan-Performance-Benchmark-on-Intel-Xeon-Processors.pdf
4. „HOW WE MATCH REGULAR EXPRESSIONS”, https://01.org/node/3777
5. „Hyperscan Glossary, a few philosophical points”, https://lists.01.org/pipermail/hyperscan/2016-September/000035.html
6. „Software-based Acceleration of Deep Packet Inspection on Intel Architecture”,
https://openisf.files.wordpress.com/2015/11/oisf-keynote-2015-geoff-langdale.pdf
7. "Hyperscan In SURICATA: STATE OF THE UNION",
http://suricon.net/wp-content/uploads/2016/11/SuriCon2016_GeoffLangdale.pdf
8. „Hyperscan in Rspamd”, http://www.slideshare.net/VsevolodStakhov/rspamdhyperscan
9. https://www.reddit.com/r/cpp/comments/3picdx/hyperscan_highperformance_multiple_regex_matching/
27
29. Basic benchmark
• Xeon E3-1231 v3 @ 3.40GHz, turbo mode disabled, 10G ixgbe port, 1 core
• two cache lines prefetched
• results in Mpps
29
network net.1 acl
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0
pass
end
regex baz "^foobar"
network net.1 acl
regex drop baz pass udp
pass
end
plnog_udp_acl rx_median 12.912; tx_median 0.000; gen_rx 0.000; gen_tx 14.881
plnog_udp_regexp rx_median 9.832; tx_median 0.000; gen_rx 0.000; gen_tx 14.881
30. Basic benchmark
// ETH() / IP() / UDP() / ('x'*64 + 'foobar')
regex baz "^(.{8}){0,8}foobar"
network net.1 acl
regex drop baz pass udp
pass
end
matching
plnog_udp_acl_many rx_median 5.846; tx_median 0.000; gen_rx 0.000; gen_tx 9.191
plnog_udp_regexp_many rx_median 2.921; tx_median 0.000; gen_rx 0.000; gen_tx 9.191
not matching
plnog_udp_acl_many rx_median 4.518; tx_median 4.518; gen_rx 4.517; gen_tx 9.124
plnog_udp_regexp_many rx_median 5.352; tx_median 5.352; gen_rx 5.353; gen_tx 9.124
30
network net.1 acl
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 8
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 16
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 24
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 32
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 40
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 48
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 56
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 64
pass
end
31. Summary
• header and payload are the same
• regexp engines can be fast
• careful benchmarking required
• x86 platform can compete with „hardware appliances”
31