This document discusses experiments conducted with 100G networking for data-intensive research. It provides details on tests of 100G networking performance between various locations in Europe and between Europe and Australia. Key findings include achieving over 90Gbit/s throughput for UDP traffic on dedicated links between sites in Europe. For TCP traffic over long-haul links between Europe and Australia, throughput of around 28Gbit/s was achieved, limited by the TCP protocol and round-trip time of over 300ms for that path. Accelerated receive flow steering techniques with network interface cards were also able to effectively direct different traffic flows to specific processor cores.
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Experiments in 100G networking for data-intensive research
1. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
Experiments in 100G networking for
data-intensive research
Richard Hughes-Jones
Networkshop47
Nottingham, 10 April 2019
2. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
SKA The Square Kilometre Array
• Australia
• Canada
• China
• France
• India
• Italy
• Netherlands
• New Zealand
• South Africa
• Spain
• Sweden
• UK
Potential new members:
• Germany,
• Japan,
• Portugal
3. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
SKA Phase 1 Two Telescopes – One Observatory
SKA1_Mid 350 MHz – 14 GHz
64 MeerKAT dishes & 133 SKA1 dishes
120 km baselines at Karoo
SKA1_Low 50 – 350 MHz
131,000 dipoles 512 stations of 256 antennas
65 km baselines at Murchison
South AfricaAustralia
4. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
SKA1 Mid in South Africa
SKA1_Mid 350 MHz – 14 GHz
64 MeerKAT dishes and 133 SKA1 dishes
Scattered over a 200 km diameter area in the Karoo
Core 1 km radius Spiral Arms out to 100 km
5. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
SKA Uses 3 Networks for Signal and Data Transport
Data Network
• DDBH
• CSP-SDP
• SDP to
Regional Centres
Sync & Timing
• Clock ensemble
• Freq. & Phase nice photonics
• 1 ps accuracy, 10ns over 10 years
• UTC time White Rabbit
Non-Science Data
• Control & Monitor
• Alarms
• Internet, VoIP
NSDNNSDN NSDN
CSP-SDP
6. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
SKA Pkase1 Data Flows
~ 2 Pbit/s
SKA1-LOW
SKA1-MID
~ 20 Tbit/s
~ 7 Tbit/s
7.9Tbit/s
9.3Tbit/s
~130 Pflops
300 PB/y
100 Gbit/s
100 Gbit/s
~130 Pflops
300 PB/y
SKA
Regional Centres
7. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
The AENEAS Project
8. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
European SKA Regional Centre
• Create a specification and design for
a European-scale, federated Regional Centre for the SKA.
• Host replica of the SKA science archive.
• Provide access to and distribute
Science Data Products and the Advanced Data Products .
• Provide compute and storage resources for
SKA science extraction by users.
• Provide analysis capabilities.
• Provide user support.
• Coordination with ICT communities, industry,
and service providers.
• Facilitate shared development, interoperability, and
innovation.
• The ESRC is part of a global network of centres.
9. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
The Network Environment: Global Overlay and Dedicated Links
4/12/2019 Richard Hughes-Jones AENEAS Face to Face Meeting / Manchester 9
• OPEX Link and transmission on the academic networks with 10 to 15 year IRUs priced in 2024 US dollars.
• Dedicated Primary 100 Gigabit link USD 750 K each per year sum=2M USD
• Dedicated Backup 100 Gigabit link USD 2M each per year sum=4M USD
• Use of shared NREN paths.
10. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
100 Gigabit Tests in the GÉANT Lab
11. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• Core 6 on the socket with the PCIe to the NIC
• ConnectX-5 NICs B2B
• Rx ring buffer 4096
• Sending UDP packets and measure the rate.
• Drop of 12.6 Gbit/s at 7813 Bytes exits for NICs
other than Mellanox – artefact in Fedora 23 kernel.
• Smaller drop of 5.1 Gbit with FW ON
• Effect of firewall ~10 Gbit/s reduction.
udpmon_send: The effect of the Firewall
0
5
10
15
20
25
30
35
40
45
0 2000 4000 6000 8000 10000
SenduserdatarateGbit/s
Size of user data in packet bytes
pkt_size_send_GEANT-DTN1_22Jan18
FWall ON
Fwall OFF
12. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• B2B, no Firewall
• Core 6 is on the socket with the PCIe to the NIC
• ConnectX-5 NIC
• Rx ring buffer 4096
• RTT 0.4 µs
• Expected difference between cores
• Applications different too
• iperf3 while transmitting at 80 Gbit/s
CPU core 98% in kernel mode.
TCP Achievable Throughput: Which cores and application
Single TCP flow iperf2
Single TCP flow iperf3
13. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• B2B, no Firewall
• Core 6 is on the socket with the PCIe to the NIC
• ConnectX-5 NIC
• Packets dropped by the NIC
if Rx ring buffer < 4096
TCP Achievable Throughput: Size of ConnectX-5 ring buffer
Single TCP flows iperf3
% TCP re-transmitted segments rx 1024
14. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• Max packet size 4096 Bytes,
• Every message is acknowledged
• Messages ≥ 20k bytes, throughput > 90 Gbit/s
• CPU core 90% in user mode.
• App design needs to take care of ring buffers
• Only wait for send post completion
every 64 messages.
• Every 64 msg application takes ~42 µs
otherwise 0.1 to 0.2 µs.
RDMA RC
Time between sending messages with RDMA
Achievable throughput vs message spacing
15. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• Modified libvma to fragment correctly
• Standard udpmon applications
• UDP performance excellent
throughput >95 Gbit/s core6 – core6
• TCP performance poor:
• TCP iperf kernel 57.7 Gbit/s
• TCP iperf libvma 13.6 Gbit/s
UDP udpmon and libvma kernel bypass library
Smooth increase in BW vs packet size to > 90 Gbit/s
Throughput as a function of packet spacing
Inter-packet arrival times FWHM 2 µs
16. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
100 Gigabit Tests with the DTNs on the GÉANT Network
17. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
Network Topology Connecting the DTNs
Can be Trunks
(multi VLANs)
18. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• London to Paris over GÉANT
• No Firewall
• Core 6 is on the socket with the PCIe to the NIC
• ConnectX-5 NIC
• Rx ring buffer 8192
• Throughput 43 Gbit/s for 7813 Byte packet
• Jitter 4 µs FWHM
• Some side lobes at ± 16 µs
due to cross traffic
• Good network stability.
UDP Performance over GÉANT: Throughput and Packet Jitter
Achievable UDP Throughput
Inter- packet arrival times
0
5
10
15
20
25
30
35
40
45
50
0 5 10 15 20
RecvWirerateGbit/s
Spacing between frames us
DTNLon-Par_100G_NOFW_03Jul18
4000 bytes
6000 bytes
7813 bytes
8972 bytes
0
500
1000
1500
2000
2500
3000
0 50 100 150 200 250 300
N(t)
Latency us
1472 bytes w= 80 DTNLon-Par_100G_NOFW_03Jul18
19. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• Route: London-London2-Paris
• TCP offload on, TCP cubic stack
• Firewalls ON
• RTT 7.5 ms.
• Delay Bandwidth Product 93.8 MB for a 100 Gbit/s flow.
• One TCP flow rises smoothly to the 36 Gbit/s plateau
at window of ~35 MBytes. (Includes Slowstart)
• Rate after slowstart 37.1 Gbit/s
• Plateau from 5s onwards
• NO TCP re-transmitted segments
• Achievable throughput limited by CPU not DBP
• Active core 100 % in kernel mode TCP buffer ≥ 40 MB
• Lab tests got ~60 Gbit/s
• FireWalls OFF improves by ~ 4 Gbit/s
100 Gigabit TCP Performance GÉANT London to Paris
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30
BWGbit/s
Time in the flow sec
DTNLon-Par_100G_TCPbuf_03Jul18
3.0M
20M
40M
80M
100M
0
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100
BWGbit/s
Buffer size Mbyte
DTNLon-Par_100G_TCPbuf_03Jul18
20. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• RTT 7.5 ms
• TCP buffer size 40 MBytes
• TCP throughput over 30 Hrs
• 32.5 Gbit/s
• No TCP segment re-transmissions
• Very stable
TCP Performance London – Paris 32 Gbit/s Single Flow Over GÉANT
0
5
10
15
20
25
30
35
40
0 20000 40000 60000 80000 100000 120000
BWGbit/s
Time during transfer sec
DTNLon-Par_TCP-tseries_15Mar18
21. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
AENEAS DTNs & Network Topology Jodrell Bank to GÉANT
22. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• UDP throughput 99 Gbit/s for >7813 byte packets
• 10-20% packet loss 6 & 4k byte packets
spacing < 0.5 µs
• Packet Jitter very good
FWHM 2 µs no side lobes
100 Gigabit UDP udpmon with libvma GÉANT DTNlon to JBO
Achievable UDP throughput vs packet size
Packet loss as a function of packet spacing
Inter-packet arrival times FWHM 2 µs
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8 10
RecvWirerateGbit/s
Spacing between frames us
DTNLon-Remus_VMA_a40_02Jul18
4000 bytes
6000 bytes
7813 bytes
8972 bytes
0.01
0.1
1
10
100
0 2 4 6 8 10
%Packetloss
Spacing between frames us
DTNLon-Remus_VMA_a40_02Jul18
4000 bytes
6000 bytes
7813 bytes
8972 bytes
0
1000
2000
3000
4000
5000
6000
7000
0 50 100 150 200 250 300
N(t)
Latency us
1472 bytes w= 80 DTNLon-Remus_VMA_a40_02Jul18
23. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
Using the NIC for accelerated Receive Flow Steering aRFS 1
• Direct the processing of interrupts for received flows to a given core.
• ethtool can attach a RX ring to a specified flow:
ethtool --config-ntuple ens7 flow-type udp4 dst-port 14233 loc 1 action 10
ethtool --config-ntuple ens7 flow-type udp4 dst-port 14234 loc 2 action 11
ethtool --config-ntuple ens7 flow-type udp4 dst-port 14235 loc 3 action 12
ethtool --config-ntuple ens7 flow-type udp4 dst-port 14236 loc 4 action 13
• Send three 20 Gbit/s UDP flows each with 10M packets
RX ring & core
interface Filter number
Flow IRQs not moved Remove IRQ from app
cores
Steer flows to app
cores
IRQ & apps different
cores
IRQ & apps different
cores
Fix sender
1 52% 9.9Gbit 24% 15.5 Gbit 28% 14.6 Gbit 0 % 20 Gbit 0 % 20 Gbit
2 19% 16.6Gbit 0.002% 20.5 Gbit 29% 14.5 Gbit 0 % 20 Gbit 0 % 20 Gbit
3 0% 17.3 Gbit 4.4% 16.4 Gbit 1.1% 17 Gbit 0 % 16 Gbit 0 % 20 Gbit
4 0 % 20 Gbit
24. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
Using accelerated Receive Flow Steering in Real Life
• Send four 20 Gbit/s UDP flows run for 8 hours
• A few of the 10s sample periods showed some small packet loss -- overall 4 10-7 %.
Cores processing IRQs
Cores processing application
25. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
Reaching the TCP Limit on Long-Haul
26. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
The Network Path between GÉANT (Lon, Par) – AARNet (Canberra, MRO)
Thanks to Karl Meyer
27. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• Route GÉANT, ANA300, Internet2, & AARNet:
Paris-New York-Seattle-LosAngeles-Sydney-Canberra
• TCP offload on, TCP cubic stack
• Fedora 26 kernel 4.11.0-0.rc3.git0.2.fc26.x86_64
• RTT 303 ms.
• Delay Bandwidth Product 3.78 GB for 100 Gigabit
• One TCP flow rises smoothly to 26.1 Gbit/s
at 1023 MBytes including slowstart.
• No TCP re-transmitted segments
• Rate after slowstart 28.3 Gbit/s
• Plateau after ~15s
• Reach the limit of TCP protocol
Max TCP window is 1 Gbyte
• Rate for RTT 303 ms and TCP window 1023 MB
28.32 Gbit/s
• CPU core only 75-80 % in kernel mode
100 Gigabit between GÉANT Paris and AARNet Canberra
0
5
10
15
20
25
30
35
40
45
50
0 200 400 600 800 1000
BWGbit/s
Buffer size Mbyte
DTNPar-AARNetCan_TCPbuf_30May18
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50 60
BWGbit/s
Time in the flow sec
DTNPar-AARNetCan_TCPbuf_30May18 50M
500M
750M
900M
1023M
28. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• To fix the Window size there is the Window Scale factor
negotiated at the SYN exchange. RFC 7323 (obsoletes 1323)
• Max value 14 max Window (216 + 214 ) 1024 MB
• Window size < Sequence number
• Deal with sequence number wrapping
• Allow to tell is a segment is old or new
The TCP Protocol Limit
232 4096 MB
TCP Header
216 64 kB
29. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
• Route GÉANT, ANA300, Internet2, & AARNet:
Paris-New York-Seattle-LosAngeles-Sydney-Canberra
• RTT 303 ms.
• TCP window 1023 MB.
• Two 4 minute TCP flows
• Second flow started 30s after the first
• Each flow stable at 28.3 Gbit/s
• Total transfer rate 56.6 Gbit/s
• 1.55 Tbytes data sent in 4.5mins.
• No TCP segments re-transmitted.
100 Gigabit: Multiple flows between GÉANT and AARNet
0
10
20
30
40
50
60
0 50 100 150 200 250 300
BWGbit/s
Time during transfer sec
exp1-AARNet_TCP_teries_04Feb17
30. Advanced European Network of E-infrastructures
for Astronomy with the SKA AENEAS - 731016
www.geant.org
Questions
AARNet, GÉANT and SANReN are partners in the SaDT consortium.
2019 IGO signed
2019 System design finalised
2020 Construction begins
New affordable path Singapore - Europe
Issue with the application state machine model
What happens if receiver is waiting for data and sender fails?
Separate interface for ssh access
Data transfer interfaces 10GE and 100GE direct into the GÉANT Core routers
Creation of different VLANs on the Data transfer interfaces is possible
As you would hope throughput very similar to the lab
For RTT 7.5ms and buffer 40MB expect 42.7 Gbit/s
Flows on the two 100GE wavelengths that for the Lag London to Paris
Separate interface for ssh access
100 Gigabit Layer 2 circuit
But recent tests from Paris showed some packet loss
CSP SDP in the telescope 90 100 Gigabit links.
London & Paris
Canberra and MRO
Factor of 2 between transmit and receive
Wraps every 1/3 sec
Internet Draft Jan 2017 suggests can increase the max window size
RFC 7323 obsoletes 1323
For 3 flows all were reduced but no TCP segments re-transmitted