This slide deck was presented on a Docker Meetup in Melbourne in March 2016. Linux namespaces and how they working together with Docker were covered in detail as an introduction to this presentation. In the main part was discussed solution that uses VXLAN networks together with EVPN BGP signalling to route traffic between Docker containers.
2. Contents
2
● Linux network namespaces
○ Introduction
○ Binding interface to namespace
● Docker networking
○ Namespaces
○ Inbound and Outbound traffic flows
○ Clustered environments
○ Challenges
● VXLAN
○ Introduction
○ VXLAN signalling
○ VXLAN and Docker
● BGP
○ Routing VXLAN with BGP
○ Scaling VXLAN based Docker networks with BGP
○ PoC
● What wasn’t covered in this presentation
3. Linux network namespaces
3
Network namespaces is a part of containerization
technology that used by Linux kernel
Network namespaces allows:
○ To create linux container network isolation
instances (namespaces)
○ With own routing table, virtual interfaces, L2
isolation
● The tool that is used to operate with network ns:
iproute2
● Network namespaces are stored in
○ /var/run/netns
● There two types of network namespaces:
○ Root namespace [ ip link ]
○ Non-root namespace [ ip netns .. ip link ]
4. Bind interface to network namespace
4
Change namespace for eth0-NAMESPACE1 from Root to NAMESPACE1
When network namespace is created it has only one interface Loopback:
We can create a pair of peered ip links in the root namespace:
5. Bringing namespaced interface UP
5
After bringing UP veth part of the pipe, interface inside NAMESPACE1 also becomes UP
We can rename interface inside namespace and try to bring it UP
Finally assign ip address on eth0 interface inside NAMESPACE1
6. Docker and network namespaces
6
Docker supports different format of containerisation:
● Libcontainer - own native go-lang implementation to use kernel containerisation
capabilities. Default (since 0.9)
● LXC was default before 0.9
Hence docker uses libcontainer every container that created with network namespace
would not be seen in ip netns output
However it is possible to expose it if you now docker container process pid:
PID=$(docker inspect -f '{{.State.Pid}}' $container_id)
ln -s /proc/$PID/ns/net /var/run/netns/$PID
Instead of PID you can use any name, container_id for example
7. Docker networking: introduction
7
Docker does for you:
- ip link pair: vethXXXXXX <-> eth0 inside the
container’s namespace
- Adds to docker0 (by default) bridge a vethXXXXX
interface (which is tunnel-end in Root namespaces).
- Sets up ip address from docker0 network range.
- Creates a rule in iptables that will organize for you
NAT (PAT) translation, masquerading containers’
network behind default eth0 interface
8. Docker networking: exposing ports
8
Docker can expose internal ports and even
interfaces:
- Network type: host. No network namespaces
isolation, root namespace will be used
- Supply port numbers to be exposed: iptables
rules would be created to allow given port(s)
number and create a port mapping (port
translation) rule.
9. Docker networking:
Clustered environments
9
Now docker offers multi host networking using Docker Swarm, KV store to signal
Network and Clustering using Docker Swarm. Overlay transport Requires Linux
Kernel version > 3.17
10. Current challenges
10
KV store approach is a great way to interconnect different docker-runnings nodes
for Docker only environments. But it still has scalability limitations for WAN, Multi-
Datacenters and not only Docker scenarios.
- Modern service-oriented applications consists of multiple processes. Sometimes
platform can be described as 30-40 applications, which would be great to
containerise
- Old networking child issues could return - broadcast domain problems,
segmentation and etc.
- Docker offers VXLAN support which allows you to scale to certain extent.
However how to distribute knowledge about VXLAN database for non-Docker
networks ?
11. VXLAN introduction
11
VXLAN overlay networking technology that allows to send Ethernet traffic encapsulated into UDP datagrams
over IP/GRE networks. Detailed description of VXLAN networking could be found in RFC7348
24 bit VNI field is VXLAN address field that could be
compared with 802.1q tag for Ethernet frames or MPLS
label.
Bare in mind MTU value when using VXLAN
12. VXLAN signalling
12
VXLAN network should be properly signalled otherwise participating hosts would not know about existence
of each other. In terms of signalling this particular information should be advertised:
- VXLAN Tunnel End-Point (VTEP) - identifies EndPoint, an entity that organizes and terminates VXLAN
tunnels
- VXLAN Network Identifier (VNI) - identifies the network, similar to 802.1q tag or MPLS label
- IP and MAC addresses
Ways of signalling VXLAN:
- Unicast way - dedicated controller
- Multicast way - using PIM and VNI:VTEP pairs
propagated as Multicast routes
- Docker has implementation with KV store
- OpenContrail can use XMPP
- BGP
13. VXLAN signalling with BGP: EVPN
13
Using BGP protocol to carry VXLAN and MAC/IP information is described at following RFCs:
- http://tools.ietf.org/html/rfc7432
- https://tools.ietf.org/html/draft-ietf-bess-evpn-overlay-02
- https://tools.ietf.org/html/rfc4684
BGP protocol is designed to be highly extensible and that is why it is possible to use NLRI
to carry other information than IPv4/IPv6 routes.
For EVPN following Address families were allocated:
● AFI 25 - which matches to L2VPN networks signalling over BGP (Kompella approach)
● SAFI 70 - subaddress family for EVPN (VXLAN)
Basicly VXLAN information is carried as BGP routes.
14. VXLAN and Docker
14
To create multi-tenant Docker networks with advanced isolation we can use VXLAN in the
following way:
- Create a dedicated interface that has type vxlan
- Create a bridge interface where we can stitch together vxlan interface and Root
namespace leg of container interface
- Create a forwarding table entry
bridge fdb add to 00:17:42:8a:b4:05 dst 192.19.0.2 dev vxlan0
- It would be signalled using multicast address 239.1.1.1 on port 4789 (mutlicast should be
supported)
OR
- Configure KV store parameters as daemon arguments and create overlay network
- docker network create --driver overlay my-multi-host-network
16. Docker with EVPN and BGP
16
To achieve highly scalable network for Docker we can use:
- VXLAN as a forwarding plane to carry network traffic and isolate different
container groups and hosts
- Signal VXLAN using BGP to manage large Multi-datacenter networks
- CNI plugin to bring EVPN tunnels up automatically (Kubernetes)
Python written BGP implementation for VXLAN and BGP: bagpipe-BGP, code based
on ExaBGP
https://github.com/Orange-OpenSource/bagpipe-bgp
Go BGP implementation - GoBGP - Route Reflector https://github.com/osrg/gobgp
19. DEMO
19
Description:
- 4 virtual machines: 3 - bagpipe-bgp and 1 goBGP route reflector
- Dockerbgp1, Dockerbgp2 and Dockerbgp3 establish BGP session to
goBGP RR: 192.168.33.30
- dockerbgp1: 192.168.33.10, runninng web server
- dckerbgp2: 192.168.33.20, running curl
- dockerbgp3: 192.168.33.30, just busybox for ping test
EVPN network: 192.168.10.0/24
IP network for hosts: 192.168.33.0/24
20. What we did not cover
20
- Another BGP project for Docker and Kubernetes IP networking:
https://www.projectcalico.org/why-bgp/
- CNI the Container Network Interface, is a proposed standard for
configuring network interfaces for Linux application containers.
https://github.com/appc/cni
- IP VPN networks using Bagpipe BGP and Open vSwitch