Packet buffer memory is among the oldest topics in networking, and yet it never seems to fade in popularity. Starting from the days of buffers sized by the bandwidth delay product to what is now called "buffer bloat", from the days of 10Mbps to 100Gbps, the discussion around how deep should the buffers be never ceases to evoke opinionated responses.
In this webinar we will be joined by JR Rivers, co-founder and CTO of Cumulus Networks, a man who has designed many ultra-successful switching chips, switch products, and compute platforms, to discuss the innards of buffering. This webinar will cover data path theory, tools to evaluate network data path behavior, and the configuration variations that affect application visible outcomes.
How deep is your buffer – Demystifying buffers and application performance
1. 1
March 14, 2017
JR Rivers | Co-founder/CTO
A JOURNEY TO DEEPER UNDERSTANDING
Network DataPath
2. 2
How Much Buffer – the take away
If the last bit of performance matters to you, do the testing
§ be careful of what you read
If not, take solace…
…the web-scales use “small buffer” switches
Network Data Path
3. 3
Tools and Knobs – Show and Tell
Network Data Path
cumulus@server02:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Model name: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz
Stepping: 5
CPU MHz: 1600.000
CPU max MHz: 2268.0000
CPU min MHz: 1600.0000
BogoMIPS: 4441.84
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-15
Internet
…
25GE attached servers, 100G interconnect
server01
server02
leaf01
server03
server04
leaf03
edge01
exit01
spine01
oob-mgmt-server
oob-mgmt-switch
100G
25G
Link Under Test
10. 10
Tools and Knobs – TCP Tuning
Network Data Path
cumulus@edge01:/proc/sys/net/ipv4$ ls tcp_*
tcp_abort_on_overflow tcp_keepalive_probes tcp_reordering
tcp_adv_win_scale tcp_keepalive_time tcp_retrans_collapse
tcp_allowed_congestion_control tcp_limit_output_bytes tcp_retries1
tcp_app_win tcp_low_latency tcp_retries2
tcp_autocorking tcp_max_orphans tcp_rfc1337
tcp_available_congestion_control tcp_max_reordering tcp_rmem
tcp_base_mss tcp_max_syn_backlog tcp_sack
tcp_challenge_ack_limit tcp_max_tw_buckets tcp_slow_start_after_idle
tcp_congestion_control tcp_mem tcp_stdurg
tcp_dsack tcp_min_rtt_wlen tcp_synack_retries
tcp_early_retrans tcp_min_tso_segs tcp_syncookies
tcp_ecn tcp_moderate_rcvbuf tcp_syn_retries
tcp_ecn_fallback tcp_mtu_probing tcp_thin_dupack
tcp_fack tcp_no_metrics_save tcp_thin_linear_timeouts
tcp_fastopen tcp_notsent_lowat tcp_timestamps
tcp_fastopen_key tcp_orphan_retries tcp_tso_win_divisor
tcp_fin_timeout tcp_pacing_ca_ratio tcp_tw_recycle
tcp_frto tcp_pacing_ss_ratio tcp_tw_reuse
tcp_fwmark_accept tcp_probe_interval tcp_window_scaling
tcp_invalid_ratelimit tcp_probe_threshold tcp_wmem
tcp_keepalive_intvl tcp_recovery tcp_workaround_signed_windows
tcp_ecn - INTEGER
Control use of Explicit Congestion Notification (ECN) by TCP.
ECN is used only when both ends of the TCP connection indicate
support for it. This feature is useful in avoiding losses due
to congestion by allowing supporting routers to signal
congestion before having to drop packets.
Possible values are:
0 Disable ECN. Neither initiate nor accept ECN.
1 Enable ECN when requested by incoming connections and
also request ECN on outgoing connection attempts.
2 Enable ECN when requested by incoming connections
but do not request ECN on outgoing connections.
Default: 2
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
12. 12
Tools and Knobs – What’s next for me
Find/write a good ”mice” traffic generator
§ modify iperf3 to include mean-time-to-completion with blocks
DCTCP with both ECN and Priority Flow Control
§ High performance fabrics combine end-to-end congestion
management and lossless links
Infiniband, Fibre Channel, PCIe, NumaLink, etc
Network Data Path
13. 13
How Much Buffer – the take away
If the last bit of performance matters to you, do the testing
§ be careful of what you read
If not, take solace…
…the web-scales use “small buffer” switches
Network Data Path