SlideShare a Scribd company logo
1 of 10
Download to read offline
Page 1
Why 10 Gigabit Ethernet? Why Now?
With the advent of Ultra Low-Latency Switches, such as the Nexus 5000 which provides a consistent sub
3 uSec latency, regardless of load or packet size, End-to-End latency is becoming more important. From
an End-to-End Latency perspective, 90% of latency is In-Host, as opposed to In-Network. In addition to
faster switches and decreased serialization delay, 10GE NIC technology allows for lower CPU Utilization
and reduced In-Host latency.
Figure 1: Cisco Nexus 5000 Series 10 Gigabit Ethernet Switches
http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/data_sheet_c78-461802.html
Nexus 5000 Data Sheet
What About Infiniband?
Infiniband (IB) came about with the promise of Ultra Low Latency and Low CPU Utilization. With this
came a new set of problems. Ethernet has become the ubiquitous standard in the industry. We are
even starting to see conventional High Performance Computing vendors such as Myricom and Voltaire
and SAN vendors such as Brocade develop Ethernet products as Network, Storage, and HPC
environments are being converged into a single Unified 10GE Fabric. When communicating outside of
your LAN, to an exchange, for example, traffic must pass through an Infiniband Gateway to translate IB
to Ethernet, making any theoretical latency gain negligible in real-world scenarios. To take advantage of
the latencies Infiniband promised, applications had to be re-written to use RDMA, a sacrifice many were
not willing to make. IB also does not provide features that are standard in the Ethernet world such as
ACLs or QoS. In addition, conventional Network Monitoring Tools and Sniffers do not work with IB.
Page 2
What is RDMA?
RDMA, or Remote Data Memory Access, is a technology that allows a sender to write directly to a
receiver’s memory, bypassing the kernel. With conventional NICs, packets entering a NIC are processed
by the server’s CPU using the Operating Systems UDP/IP Stack. This process requires multiple
interrupts, context switches, and copies of the data before it ends up in application memory, available
for use.
Figure 2: Conventional Server I/O
In an RDMA environment, this process is much simpler. It is called Kernel Bypass/Zero Copy. With
RDMA, the packet is processed by the NIC and is then copied directly to application memory without
requiring processing by the CPU. This ultimately produces reduced In-Host Latency and lower CPU
Utilization.
Figure 3: Kernel Bypass/Zero Copy Server I/O
Page 3
What Cables Can I Use?
10GbaseT
10GbaseT will allow for 10GE speeds over Cat6e cables. This is the eventual low cost solution for
10G/1G/100Mbps communication; however the technology is in its infancy. Today’s 10GbaseT Phys
consume ~8W and induce 2.5 uSecs of latency per port. This will eventually be reduced and will be
incorporated into future Cisco products; however it is not supported today.
TwinAx (CX1)
The current low-cost, low-power solution for 10GE is TwinAx cabling. This consists of a copper (CX1)
cable with an SFP+ Transceiver directly attached each end.
Note: Each Transceiver induces an additional 50 nSec of latency, 100 nSec total per cable.
Figure 4: TwinAx Cable
SFP+
SFP+ provides the lowest latency solution today. With a variety of SFP+ transceivers available for multi
mode and single mode fiber, there are plenty of options for 10GE cabling without the added latency of
10GbaseT or TwinAx. With its smaller form factor, lower cost, and lower power consumption compared
to previous X2 and XENPAK transceivers, SFP+ allows for much higher port densities than previously
possible. The Nexus 5010 currently supports up to 26 Line Rate 10GE Ports in a compact 1 RU form-
factor with the 5020 providing 52 Line Rate 10GE ports in 2 RUs. SFP+ transceivers look, smell, and feel
like SFP transceivers, however they operate at 10 Gbps speeds. A limited number of SFP+ ports will also
accept GE SFP Transceivers for backwards compatibility.
Page 4
What NICs Should I Use?
iWARP
iWARP utilizes RDMA over Ethernet instead of Infiniband. This provides the same Kernel Bypass/Zero
Copy functionality, without the need for a secondary infrastructure. However with iWARP, just as with
IB, applications must be written to the lib.ib.verbs library to take advantage of this functionality.
Key Players: NetEffect (Intel), Chelsio, Mellanox, ServerEngines
Supported Operating Systems: Linux
User Space APIs
Numerous NIC vendors are now developing User Space APIs which give you all the benefits of iWARP,
without having to re-write your application. This middleware translates between sockets programming
and the hardware.
Key Players: Myricom and Solarflare
Figure 5: User Space Library Software Block Diagram
MX (Myrinet Express)
Myricom has their roots in High Performance Computing. They originally developed a HPC protocol
called Myrinet, but have since ported their development toward 10GE.
Key Players: Myricom
Supported Operating Systems: Linux and Windows
TCP/UDP Acceleration
Page 5
OpenOnload
Key Players: Solarflare
Supported Operating Systems:
TCP/UDP Acceleration
Figure 6: OpenOnload Software Block Diagram
Page 6
SR-IOV (Single Root Input/Output Virtualization)
SR-IOV was originally designed for a Virtual Machine environment. SR-IOV allows for a single 10GE NIC
to be divided into multiple Virtual NICs (vNIC), which are then mapped to Virtual Machines. This same
concept can be applied in a non-virtualized environment, mapping each vNIC to Application Memory
once again providing Kernel Bypass/Zero Copy functionality.
Key Players: Server Engines (Chelsio, NetEffect, Mellanox, Broadcom, and Neterion in Future)
Supported Operating Systems:
TCP/UDP Acceleration
Figure 7: SR-IOV in a Virtualized Server Environment
Page 7
How Does This Affect My Applications?
Cisco has teamed with NetEffect (Intel), to provide a solution which provides the theoretical advantages
of Infiniband, without the drawbacks. Cisco and NetEffect combined forces to write a middleware called
RAB, or RDMA Accelerated Buffers, which is optimized for use with Wombat Data Fabric. Cisco is also
exploring another middleware called DAL, or Datagram Acceleration Layer which could be used with
TIBCO RV, or any other application using UDP Multicast. This middleware allows for the decreased CPU
Utilization and reduced In-Host Latency with no modifications to your application.
Figure 8: RAB and DAL Middleware Software Block Diagram
Conventional Server I/O requires packets to be processed by the server’s CPU using the Operating
System’s UDP/IP Stack. This involves multiple interrupts, context switches, and copies of the data. This
ultimately leads to high CPU Utilization and unnecessary In-Host Latency.
Page 8
Figure 9: Conventional UDP/IP Communication
DAL Middleware intercepts conventional sockets calls and writes them directly to the NetEffect NIC,
providing the Kernel Bypass/Zero Copy functionality without the headaches of re-writing your
application, as was required with Infiniband.
Figure 10: Kernel Bypass, Zero Copy Communication with DAL
Page 9
How Does This Impact Latency?
As mentioned earlier, with today’s low latency networks, the source of 90% of latency is actually within
the server itself, rather than in the network. We performed a baseline test and found ping pong latency
to be on the order of 35-40 uSec with 30 uSec residing within the server and only 7 uSec from the core
switching infrastructure.
Figure 11: Sources of End to End Latency
We are seeing Market Data and High Performance Computing environments move to 10GE not only for
added throughput, but for reduced latency. This results in Serialization Delay being reduced by an order
of magnitude. For Jumbo Size Frames, this will result in latency being decreased from 72 to 7.2 uSec as
seen below.
Table 1: Serialization Delay Comparison
Page
10
Furthermore, by utilizing the User Space APIs and the Kernel Bypass/Zero Copy functionality they
provide, we have seen Application Layer to Application Layer Latency reduced to less than 6 uSec.
Table 2: Latency Comparison
Overall, the end result of moving from GE to 10GE is an overall End-to-End latency decrease of over
80%.
Figure 12: End to End Latency Comparison

More Related Content

What's hot

PLNOG16: Kreowanie usług przez operatorów – SP IWAN, Krzysztof Konkowski
PLNOG16: Kreowanie usług przez operatorów – SP IWAN, Krzysztof KonkowskiPLNOG16: Kreowanie usług przez operatorów – SP IWAN, Krzysztof Konkowski
PLNOG16: Kreowanie usług przez operatorów – SP IWAN, Krzysztof KonkowskiPROIDEA
 
Quieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyQuieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyMichelle Holley
 
Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...
Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...
Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...Cloud Native Day Tel Aviv
 
Mellanox for OpenStack - OpenStack最新情報セミナー 2014年10月
Mellanox for OpenStack  - OpenStack最新情報セミナー 2014年10月Mellanox for OpenStack  - OpenStack最新情報セミナー 2014年10月
Mellanox for OpenStack - OpenStack最新情報セミナー 2014年10月VirtualTech Japan Inc.
 
Mutating IP Network Model Ethernet-InfiniBand Interconnect
Mutating IP Network Model Ethernet-InfiniBand InterconnectMutating IP Network Model Ethernet-InfiniBand Interconnect
Mutating IP Network Model Ethernet-InfiniBand InterconnectNaoto MATSUMOTO
 
Nexus 7000 Series Innovations: M3 Module, DCI, Scale
Nexus 7000 Series Innovations: M3 Module, DCI, ScaleNexus 7000 Series Innovations: M3 Module, DCI, Scale
Nexus 7000 Series Innovations: M3 Module, DCI, ScaleTony Antony
 
Mellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox Technologies
 
PLNOG 5: Emil Gągała - ADVANCED VPLS
PLNOG 5: Emil Gągała -  ADVANCED VPLSPLNOG 5: Emil Gągała -  ADVANCED VPLS
PLNOG 5: Emil Gągała - ADVANCED VPLSPROIDEA
 
Fiber Channel over Ethernet (FCoE) – Design, operations and management best p...
Fiber Channel over Ethernet (FCoE) – Design, operations and management best p...Fiber Channel over Ethernet (FCoE) – Design, operations and management best p...
Fiber Channel over Ethernet (FCoE) – Design, operations and management best p...Cisco Canada
 
Pcdvpcu en ex9200-customer-presentation-1
Pcdvpcu en ex9200-customer-presentation-1Pcdvpcu en ex9200-customer-presentation-1
Pcdvpcu en ex9200-customer-presentation-1He Hariyadi
 
Cloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper ContrailCloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper Contrailbuildacloud
 
#IBMEdge: "Not all Networks are Equal"
#IBMEdge: "Not all Networks are Equal" #IBMEdge: "Not all Networks are Equal"
#IBMEdge: "Not all Networks are Equal" Brocade
 
Mondaygeneralhankinsvpn2 140605100226-phpapp01 (1)
Mondaygeneralhankinsvpn2 140605100226-phpapp01 (1)Mondaygeneralhankinsvpn2 140605100226-phpapp01 (1)
Mondaygeneralhankinsvpn2 140605100226-phpapp01 (1)Gade Gowtham
 
6.) switch quick config (fixed summits)
6.) switch quick config (fixed summits)6.) switch quick config (fixed summits)
6.) switch quick config (fixed summits)Jeff Green
 
2011 TWNIC SP IPv6 Transition
2011 TWNIC SP IPv6 Transition2011 TWNIC SP IPv6 Transition
2011 TWNIC SP IPv6 TransitionJohnson Liu
 

What's hot (20)

Mellanox Storage Solutions
Mellanox Storage SolutionsMellanox Storage Solutions
Mellanox Storage Solutions
 
PLNOG16: Kreowanie usług przez operatorów – SP IWAN, Krzysztof Konkowski
PLNOG16: Kreowanie usług przez operatorów – SP IWAN, Krzysztof KonkowskiPLNOG16: Kreowanie usług przez operatorów – SP IWAN, Krzysztof Konkowski
PLNOG16: Kreowanie usług przez operatorów – SP IWAN, Krzysztof Konkowski
 
Quieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyQuieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director Technology
 
WAN - trends and use cases
WAN - trends and use casesWAN - trends and use cases
WAN - trends and use cases
 
Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...
Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...
Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...
 
Mellanox for OpenStack - OpenStack最新情報セミナー 2014年10月
Mellanox for OpenStack  - OpenStack最新情報セミナー 2014年10月Mellanox for OpenStack  - OpenStack最新情報セミナー 2014年10月
Mellanox for OpenStack - OpenStack最新情報セミナー 2014年10月
 
Mutating IP Network Model Ethernet-InfiniBand Interconnect
Mutating IP Network Model Ethernet-InfiniBand InterconnectMutating IP Network Model Ethernet-InfiniBand Interconnect
Mutating IP Network Model Ethernet-InfiniBand Interconnect
 
Nexus 7000 Series Innovations: M3 Module, DCI, Scale
Nexus 7000 Series Innovations: M3 Module, DCI, ScaleNexus 7000 Series Innovations: M3 Module, DCI, Scale
Nexus 7000 Series Innovations: M3 Module, DCI, Scale
 
Mellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox High Performance Networks for Ceph
Mellanox High Performance Networks for Ceph
 
PLNOG 5: Emil Gągała - ADVANCED VPLS
PLNOG 5: Emil Gągała -  ADVANCED VPLSPLNOG 5: Emil Gągała -  ADVANCED VPLS
PLNOG 5: Emil Gągała - ADVANCED VPLS
 
Fiber Channel over Ethernet (FCoE) – Design, operations and management best p...
Fiber Channel over Ethernet (FCoE) – Design, operations and management best p...Fiber Channel over Ethernet (FCoE) – Design, operations and management best p...
Fiber Channel over Ethernet (FCoE) – Design, operations and management best p...
 
Pcdvpcu en ex9200-customer-presentation-1
Pcdvpcu en ex9200-customer-presentation-1Pcdvpcu en ex9200-customer-presentation-1
Pcdvpcu en ex9200-customer-presentation-1
 
Cloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper ContrailCloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper Contrail
 
10G Ethernet Overview & Use Cases
10G Ethernet Overview & Use Cases10G Ethernet Overview & Use Cases
10G Ethernet Overview & Use Cases
 
#IBMEdge: "Not all Networks are Equal"
#IBMEdge: "Not all Networks are Equal" #IBMEdge: "Not all Networks are Equal"
#IBMEdge: "Not all Networks are Equal"
 
Mondaygeneralhankinsvpn2 140605100226-phpapp01 (1)
Mondaygeneralhankinsvpn2 140605100226-phpapp01 (1)Mondaygeneralhankinsvpn2 140605100226-phpapp01 (1)
Mondaygeneralhankinsvpn2 140605100226-phpapp01 (1)
 
6.) switch quick config (fixed summits)
6.) switch quick config (fixed summits)6.) switch quick config (fixed summits)
6.) switch quick config (fixed summits)
 
Contrail Enabler for agile cloud services
Contrail Enabler for agile cloud servicesContrail Enabler for agile cloud services
Contrail Enabler for agile cloud services
 
2011 TWNIC SP IPv6 Transition
2011 TWNIC SP IPv6 Transition2011 TWNIC SP IPv6 Transition
2011 TWNIC SP IPv6 Transition
 
CloudX on OpenStack
CloudX on OpenStackCloudX on OpenStack
CloudX on OpenStack
 

Similar to Why 10 Gigabit Ethernet Draft v2

DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...Jim St. Leger
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community6WIND
 
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)The Linux Foundation
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdfJunZhao68
 
Industry Brief: Streamlining Server Connectivity: It Starts at the Top
Industry Brief: Streamlining Server Connectivity: It Starts at the TopIndustry Brief: Streamlining Server Connectivity: It Starts at the Top
Industry Brief: Streamlining Server Connectivity: It Starts at the TopIT Brand Pulse
 
Marvell Enhancing Scalability Through NIC Switch Independent Partitioning
Marvell Enhancing Scalability Through NIC Switch Independent PartitioningMarvell Enhancing Scalability Through NIC Switch Independent Partitioning
Marvell Enhancing Scalability Through NIC Switch Independent PartitioningMarvell
 
Technology Brief: Flexible Blade Server IO
Technology Brief: Flexible Blade Server IOTechnology Brief: Flexible Blade Server IO
Technology Brief: Flexible Blade Server IOIT Brand Pulse
 
2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANL2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANLdgoodell
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVYoshihiro Nakajima
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloudinside-BigData.com
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Netronome
 
Nec exp ether071719
Nec exp ether071719Nec exp ether071719
Nec exp ether071719Yutaka Kawai
 
Improving performance and efficiency with Network Virtualization Overlays
Improving performance and efficiency with Network Virtualization OverlaysImproving performance and efficiency with Network Virtualization Overlays
Improving performance and efficiency with Network Virtualization OverlaysAdam Johnson
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloudinside-BigData.com
 
Demonstrating q logic ethernet performance leadership
Demonstrating q logic ethernet performance leadershipDemonstrating q logic ethernet performance leadership
Demonstrating q logic ethernet performance leadershipPoulSmith
 
Netsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvNetsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvIntel
 
Blade Server I/O and Workloads of the Future (slides)
Blade Server I/O and Workloads of the Future (slides)Blade Server I/O and Workloads of the Future (slides)
Blade Server I/O and Workloads of the Future (slides)IT Brand Pulse
 
Adhoc mobile wireless network enhancement based on cisco devices
Adhoc mobile wireless network enhancement based on cisco devicesAdhoc mobile wireless network enhancement based on cisco devices
Adhoc mobile wireless network enhancement based on cisco devicesIJCNCJournal
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchJim St. Leger
 

Similar to Why 10 Gigabit Ethernet Draft v2 (20)

DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community
 
Interop: The 10GbE Top 10
Interop: The 10GbE Top 10Interop: The 10GbE Top 10
Interop: The 10GbE Top 10
 
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdf
 
Industry Brief: Streamlining Server Connectivity: It Starts at the Top
Industry Brief: Streamlining Server Connectivity: It Starts at the TopIndustry Brief: Streamlining Server Connectivity: It Starts at the Top
Industry Brief: Streamlining Server Connectivity: It Starts at the Top
 
Marvell Enhancing Scalability Through NIC Switch Independent Partitioning
Marvell Enhancing Scalability Through NIC Switch Independent PartitioningMarvell Enhancing Scalability Through NIC Switch Independent Partitioning
Marvell Enhancing Scalability Through NIC Switch Independent Partitioning
 
Technology Brief: Flexible Blade Server IO
Technology Brief: Flexible Blade Server IOTechnology Brief: Flexible Blade Server IO
Technology Brief: Flexible Blade Server IO
 
2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANL2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANL
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 
Nec exp ether071719
Nec exp ether071719Nec exp ether071719
Nec exp ether071719
 
Improving performance and efficiency with Network Virtualization Overlays
Improving performance and efficiency with Network Virtualization OverlaysImproving performance and efficiency with Network Virtualization Overlays
Improving performance and efficiency with Network Virtualization Overlays
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
Demonstrating q logic ethernet performance leadership
Demonstrating q logic ethernet performance leadershipDemonstrating q logic ethernet performance leadership
Demonstrating q logic ethernet performance leadership
 
Netsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvNetsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfv
 
Blade Server I/O and Workloads of the Future (slides)
Blade Server I/O and Workloads of the Future (slides)Blade Server I/O and Workloads of the Future (slides)
Blade Server I/O and Workloads of the Future (slides)
 
Adhoc mobile wireless network enhancement based on cisco devices
Adhoc mobile wireless network enhancement based on cisco devicesAdhoc mobile wireless network enhancement based on cisco devices
Adhoc mobile wireless network enhancement based on cisco devices
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 

Why 10 Gigabit Ethernet Draft v2

  • 1. Page 1 Why 10 Gigabit Ethernet? Why Now? With the advent of Ultra Low-Latency Switches, such as the Nexus 5000 which provides a consistent sub 3 uSec latency, regardless of load or packet size, End-to-End latency is becoming more important. From an End-to-End Latency perspective, 90% of latency is In-Host, as opposed to In-Network. In addition to faster switches and decreased serialization delay, 10GE NIC technology allows for lower CPU Utilization and reduced In-Host latency. Figure 1: Cisco Nexus 5000 Series 10 Gigabit Ethernet Switches http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/data_sheet_c78-461802.html Nexus 5000 Data Sheet What About Infiniband? Infiniband (IB) came about with the promise of Ultra Low Latency and Low CPU Utilization. With this came a new set of problems. Ethernet has become the ubiquitous standard in the industry. We are even starting to see conventional High Performance Computing vendors such as Myricom and Voltaire and SAN vendors such as Brocade develop Ethernet products as Network, Storage, and HPC environments are being converged into a single Unified 10GE Fabric. When communicating outside of your LAN, to an exchange, for example, traffic must pass through an Infiniband Gateway to translate IB to Ethernet, making any theoretical latency gain negligible in real-world scenarios. To take advantage of the latencies Infiniband promised, applications had to be re-written to use RDMA, a sacrifice many were not willing to make. IB also does not provide features that are standard in the Ethernet world such as ACLs or QoS. In addition, conventional Network Monitoring Tools and Sniffers do not work with IB.
  • 2. Page 2 What is RDMA? RDMA, or Remote Data Memory Access, is a technology that allows a sender to write directly to a receiver’s memory, bypassing the kernel. With conventional NICs, packets entering a NIC are processed by the server’s CPU using the Operating Systems UDP/IP Stack. This process requires multiple interrupts, context switches, and copies of the data before it ends up in application memory, available for use. Figure 2: Conventional Server I/O In an RDMA environment, this process is much simpler. It is called Kernel Bypass/Zero Copy. With RDMA, the packet is processed by the NIC and is then copied directly to application memory without requiring processing by the CPU. This ultimately produces reduced In-Host Latency and lower CPU Utilization. Figure 3: Kernel Bypass/Zero Copy Server I/O
  • 3. Page 3 What Cables Can I Use? 10GbaseT 10GbaseT will allow for 10GE speeds over Cat6e cables. This is the eventual low cost solution for 10G/1G/100Mbps communication; however the technology is in its infancy. Today’s 10GbaseT Phys consume ~8W and induce 2.5 uSecs of latency per port. This will eventually be reduced and will be incorporated into future Cisco products; however it is not supported today. TwinAx (CX1) The current low-cost, low-power solution for 10GE is TwinAx cabling. This consists of a copper (CX1) cable with an SFP+ Transceiver directly attached each end. Note: Each Transceiver induces an additional 50 nSec of latency, 100 nSec total per cable. Figure 4: TwinAx Cable SFP+ SFP+ provides the lowest latency solution today. With a variety of SFP+ transceivers available for multi mode and single mode fiber, there are plenty of options for 10GE cabling without the added latency of 10GbaseT or TwinAx. With its smaller form factor, lower cost, and lower power consumption compared to previous X2 and XENPAK transceivers, SFP+ allows for much higher port densities than previously possible. The Nexus 5010 currently supports up to 26 Line Rate 10GE Ports in a compact 1 RU form- factor with the 5020 providing 52 Line Rate 10GE ports in 2 RUs. SFP+ transceivers look, smell, and feel like SFP transceivers, however they operate at 10 Gbps speeds. A limited number of SFP+ ports will also accept GE SFP Transceivers for backwards compatibility.
  • 4. Page 4 What NICs Should I Use? iWARP iWARP utilizes RDMA over Ethernet instead of Infiniband. This provides the same Kernel Bypass/Zero Copy functionality, without the need for a secondary infrastructure. However with iWARP, just as with IB, applications must be written to the lib.ib.verbs library to take advantage of this functionality. Key Players: NetEffect (Intel), Chelsio, Mellanox, ServerEngines Supported Operating Systems: Linux User Space APIs Numerous NIC vendors are now developing User Space APIs which give you all the benefits of iWARP, without having to re-write your application. This middleware translates between sockets programming and the hardware. Key Players: Myricom and Solarflare Figure 5: User Space Library Software Block Diagram MX (Myrinet Express) Myricom has their roots in High Performance Computing. They originally developed a HPC protocol called Myrinet, but have since ported their development toward 10GE. Key Players: Myricom Supported Operating Systems: Linux and Windows TCP/UDP Acceleration
  • 5. Page 5 OpenOnload Key Players: Solarflare Supported Operating Systems: TCP/UDP Acceleration Figure 6: OpenOnload Software Block Diagram
  • 6. Page 6 SR-IOV (Single Root Input/Output Virtualization) SR-IOV was originally designed for a Virtual Machine environment. SR-IOV allows for a single 10GE NIC to be divided into multiple Virtual NICs (vNIC), which are then mapped to Virtual Machines. This same concept can be applied in a non-virtualized environment, mapping each vNIC to Application Memory once again providing Kernel Bypass/Zero Copy functionality. Key Players: Server Engines (Chelsio, NetEffect, Mellanox, Broadcom, and Neterion in Future) Supported Operating Systems: TCP/UDP Acceleration Figure 7: SR-IOV in a Virtualized Server Environment
  • 7. Page 7 How Does This Affect My Applications? Cisco has teamed with NetEffect (Intel), to provide a solution which provides the theoretical advantages of Infiniband, without the drawbacks. Cisco and NetEffect combined forces to write a middleware called RAB, or RDMA Accelerated Buffers, which is optimized for use with Wombat Data Fabric. Cisco is also exploring another middleware called DAL, or Datagram Acceleration Layer which could be used with TIBCO RV, or any other application using UDP Multicast. This middleware allows for the decreased CPU Utilization and reduced In-Host Latency with no modifications to your application. Figure 8: RAB and DAL Middleware Software Block Diagram Conventional Server I/O requires packets to be processed by the server’s CPU using the Operating System’s UDP/IP Stack. This involves multiple interrupts, context switches, and copies of the data. This ultimately leads to high CPU Utilization and unnecessary In-Host Latency.
  • 8. Page 8 Figure 9: Conventional UDP/IP Communication DAL Middleware intercepts conventional sockets calls and writes them directly to the NetEffect NIC, providing the Kernel Bypass/Zero Copy functionality without the headaches of re-writing your application, as was required with Infiniband. Figure 10: Kernel Bypass, Zero Copy Communication with DAL
  • 9. Page 9 How Does This Impact Latency? As mentioned earlier, with today’s low latency networks, the source of 90% of latency is actually within the server itself, rather than in the network. We performed a baseline test and found ping pong latency to be on the order of 35-40 uSec with 30 uSec residing within the server and only 7 uSec from the core switching infrastructure. Figure 11: Sources of End to End Latency We are seeing Market Data and High Performance Computing environments move to 10GE not only for added throughput, but for reduced latency. This results in Serialization Delay being reduced by an order of magnitude. For Jumbo Size Frames, this will result in latency being decreased from 72 to 7.2 uSec as seen below. Table 1: Serialization Delay Comparison
  • 10. Page 10 Furthermore, by utilizing the User Space APIs and the Kernel Bypass/Zero Copy functionality they provide, we have seen Application Layer to Application Layer Latency reduced to less than 6 uSec. Table 2: Latency Comparison Overall, the end result of moving from GE to 10GE is an overall End-to-End latency decrease of over 80%. Figure 12: End to End Latency Comparison