SDN operators need to measure the performance of OF HW switch on their site. Cause there is 1000 times differences in latency, depends on the specified flow entry. ASIC can forward in several μsecs but the software (CPU) may take msec.
To protect yourself from unexpected performance plunge, monitor your switches healthiness on your site.
FPGA based 10G Performance Tester for HW OpenFlow Switch
1. FPGA based 10G Performance Tester for HW OpenFlow Switch
Yutaka Yasuda, Kyoto Sangyo University
2. Why (data plane) Performance Test needs for HW OpenFlow switch?
• There are some “Conformance Test” activities
• RYU Certification
• ONF PlugFest
• How about “Performance Test” ?
• Lack of it, you may fall into the pitfall.
• “It works, but too slow”
3. Typical Story : Here is a Flow Entry on the OpenFlow HW Switch…
• 2 possibilities to handle it, by Hardware (ASIC) or Software (CPU).
• It is the same functionally, but 1000 times difference in latency. ( μsec vs msec )
• It is not always documented. (basically, no reason to confess it for vendors)
• Features reply is not enough.
• May be depends on the version of the firmware and NOS of the switch.
• No easy & straight way to know it.
• Imagine, what happen when you update your firmware, NOS or OF App…..
4. Real Example? Here is.
OpenFlow
Controller
Pica8 3290
Spirent
port#1 port#2
Dev. 2 Dev. 3
port#3
#1
#2 #3
1. Spirent sends 64B length packets.
2. Pica8 has a flow entry to forward it from #2 to #3.
3. Spirent checks the latency.
Pica8 + Spirent experiment
5. In Simple and Basic configuration
• Just forwarding here to there (see below)
• Succeed to forward in wire speed. (1Gbps)
• Latency : Avg. 4.26, Min 4.13, Max 4.28 (usec)
cookie=0x0, duration=1379.649s, table=0, n_packets=0, n_bytes=0,
idle_age=1379, in_port=1,dl_src=00:10:94:00:00:05 actions=output:2
Example of the flow entry:
looks fine!
6. Good! and Boom! results
• Good results
• MAC rewrite : no additional latency, no degradation of throughput.
• ToS rewrite : same as above
• Bad and Unexpected result
• IP rewrite : deadly slow. Avg. 140ms, Min 0.8ms, Max 350ms (boom!)
• over 1000 times slow throughput
cookie=0x0, duration=3.402s, table=0, n_packets=0, n_bytes=0, idle_age=3,
ip,in_port=1,nw_src=192.85.1.5 actions=mod_nw_dst:192.85.1.16,output:2
Example of the flow entry:
7. Features Reply?
• It looks only VLAN, MAC treatment are available.
• In fact….
• ToS modification runs on the hardware.
• IP modification will fall back to the software.
• You never know if you never have a go.
root@PicOS-OVS#ovs-ofctl show br0
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000000000000111
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS STP ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP SET_DL_SRC SET_DL_DST ENQUEUE
………
8. You can test by yourself : several options
• Buy Ixia or Spirent : very accurate but super expensive, just overkill
• PC + 10G NIC + Software : cheap but inaccurate
• not easy to tune and calibrate enough. yes you can, but not for everyone.
• FPGA + 10G I/F : not super-cheap but accuracy guaranteed
• time-stamped by hardware, in clock cycle. (8ns currently)
• all time-sensitive components run independently with PC as mothership.
• easy setup. just put the board and run controller app.
9. My project : FPGA based solution
Xilinx Kintex-7, 125MHz
10G (SFP+) x4
Hardware TCP/UDP implement
PCIe gen2 x1 (just for control)
enough external memory
4x10G ports
no need to use
SAS this time
10. test scenario
........
........
test scenario
........
........
Host PC
Target Switch
FPGA +
10G I/Fs
monitor
controller
RYU+
custom App
set packet
pattern to FPGA
Operator's Browser
test scenario
........
........
HTTP POST
result
oputput
includes :
packet generate pattern
+ flow entries configuration
REST API
10G Ethernet
OpenFlow 1.x protocol
System Console
(JavaScript App)
load
OF Controller
System Structure
packet generation/send/
receive/counting will be
done in FPGA board
detail data
send packets
&
observe latency
11. Experiment #1 : 10G/1G stable forwarding measurement
IP DST mod
Match pattern Action
In-port X
Figure 1. 2. shows "ASIC" powered result. Every switch has different
distributions, but all done in sub-micro seconds. Switch A did around 2.7μ in
very steep. C has 9μ or around cause it is 1G switch.
0
20
40
60
80
100
120
140
160
180
200
2728
2736
2744
2752
2760
2768
2776
2784
2792
2800
2808
2816
2824
2832
2840
2848
2856
2864
2872
2880
packets
latency (ns)
Figure 1. Switch A (10G) latency distribution
0"
20"
40"
60"
80"
100"
120"
8448"
8576"
8704"
8832"
8960"
9088"
9216"
9344"
9472"
9600"
9728"
9856"
9984"
packets(
latency((ns)(
Figure 2. Switch B (1G) latency distribution.
(as a proof of the accuracy)
12. Experiment #2 : Unexpected show forwarding (software fallback)
IP DST mod
Match pattern Action
IP SRC
Only add an IP SRC matching added, the Switch did "software fallback". (Fig
3) Around 350-500μ. But still 2.7% packets exist on the outside of the graph,
far right. The slowest one over 10ms. And this case, 1000 times slower
forwarding.
0
20
40
60
80
100
120
362496
372736
382976
393216
403456
413696
423936
434176
444416
454656
464896
475136
485376
495616
505856
516096
526336
536576
546816
557056
567296
577536
587776
598016
608256
618496
628736
638976
649216
659456
669696
679936
690176
700416
710656
720896
731136
741376
751616
761856
772096
782336
792576
packets
latency (ns)
Figure 3. Switch B (1G) latency distribution, in software fallback situation
continue to right more...
In this case, the maximum throughput is only 16Kpps.
As 100Byte length packets, it means 12.8Mbps.
13. Experiment #3 : When it will go slow?
In switch B case;
IP matching and IP mod are able to handle by ASIC separately.
But if you specify them at once, it will be slow.
BUT IP matching and ToS mod are able to specify both at once!
Totally unexpected.... (sigh)
14. Use Case #1
Hunt the “killer entry” - unexpected slow processing order you may have
• OF Apps set the flow entries as their needs, but they don’t care about the performance.
• When your service has performance degradation, you need to make sure that “no killer
entry” exists.
OF switch
flow entries
OF switch
flow entries
OF switch
flow entries
OF switch
flow entries
Your OpenFlow Network
flow entries
testbed switch
packet pattern
packet generator
observe latency
Performance Tester
send packets
set
visualize
collect
(w counter info)
15. Use Case #2
Comparison “before & after” about the update of SW driver or NOS
• Need to check the performance degradation BEFORE you apply the update to
REAL network.
• For the future, need to see what happen if the flow entries and traffic will go double.
OF switch
flow entries
OF switch
flow entries
OF switch
flow entries
OF switch
flow entries
Your OpenFlow Network
flow entries X
flow entries Y
collect
before the update
after the update
flow entries
testbed switch
packet pattern
packet generator
observe latency
Performance Tester
send packets
set
result X
result Y
test & record
compare
16. Watch the “Killer Entry”.
To protect yourself from unexpected performance plunge,
monitor your switches healthiness on your site.