2. 2
Agenda
• IDS/IPS Application Packet Pipeline
• Explore into
Bottlenecks Solutions
I/O PCIe Slot-NUMA map
CPU Custom Libraries
Application Packet Filter, Lookup, Distribution and
Modeling
Ecosystem ViritI/O, Proc-Info, SIMD, Custom Lookup
3. Look into Suricata
3
Worker Thread
RX NIC
Capture Decode Stream Detect Output
TX NIC
Suricata is a free and open source, mature, fast and robust network threat detection engine. The Suricata engine is capable of real time intrusion
detection (IDS), inline intrusion prevention (IPS), network security monitoring (NSM) and offline pcap processing.
Suricata's multi-threaded architecture can support high performance multi-core and multiprocesser systems, Jonkman said." -- (Computerworld)
Flow identification
Stream Identification
Stream Capture
Buffers & Flows limit
Copies
Exact match
Pattern match
4. 4
IDS-IPS in Passive & Active mode
Network I/O (Multiple 10Gbit/s Interfaces)
Control, Configuration
and Stats (CLI and
Socket interface)
High Speed User Space TCP
and SSL stack configured in
proxy mode.
Clear Text
Encrypted Encrypted
5. Dive into Bottlenecks
Do we need to re-invent the Intrusion Detection, Intrusion Prevention or Network
Security Monitoring utility?
6. 6
SoC using PCIe virtual dev library
Network I/O
Config
&
Mgmt
TCP SSL
StackDPDK PMD &
MBUF Manager
User Space
• Keep packet in User Space
• Reduce latency between NIC to NIC
• Smart Filter
DPDK PMD library to rescue the I/O bottleneck
7. 7
0
100
200
300
400
500
600
700
800
900
1000
64 byte RX 64 byte TX 1500 byte RX 1500 byte TX
480
150
780
220
625
273
944
473
MBITS/SEC
PACKET SIZE
Packet
NIC to NIC PCIe 1 queue
SOC allowed up to 32 bi directional PCIe user space queues
9. 9
Suricata using DPDK
RX NIC TX NIC
Capture Decode Stream Detect Output
Worker Threads
Capture Decode Stream Detect Output
RSS HASH
Parse for
metadata
Match for
rule set
Buffer & Zero
Copy
12. 12
Setup
Super micro 4 core Xeon at 2.6Ghz and onboard 2 * 1G i350 (2x PCIe Gen2)
DPDK 1 core - 2 worker cores, 1 DPDK RX-TX. AF-Workers - 3 worker cores
• Distributed lcore and NIC. ie: single socket interfaces single NIC (4 * 10G).
• Single Machine for processing, filter, flow and Suricata.
• Reduced packet latency, since there no inter NIC-NIC transmission.
• Localized user DPDK and custom Suricata helps in zero copy.
Learnings
17. 17
VirtIO Hurdles
1. Device start & stop not working
2. Link state set up & down fails
3. LSR call back does not work
4. Application proc-info does not shows stats for right primary
application.
5. Application proc-info corrupts rte_dev_data when pcap in use