Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Webinar- Tea for the Tillerman


Published on

Watch this presentation and learn about Kubernetes Networking:

How to build applications without knowing subnets & IP addresses and build modern cloud-friendly applications in an agile fashion.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Webinar- Tea for the Tillerman

  1. 1. v Tea For The Tillerman Building a Pure L3 Fabric For Kubernetes Networking Kelsey Hightower & Dinesh G Dutt 19 April 2016
  2. 2. Key Takeaways Modern application design has evolved to ignore antediluvian ideas for service deployment, discovery and advertisement Kubernetes is an easy, scalable solution to deploying applications in the modern DC Routing on the host makes Kubernetes deployments optimal April 21, 2016 2
  3. 3. April 21, 2016 3 •Applications and Servers are the last bastion of bridging
  4. 4. How Bridging Plays A Role in Application Design Service or node discovery relies on broadcast Cluster heartbeat uses multicast Assumptions about being in a single subnet VM Mobility continued this trend April 21, 2016 4
  5. 5. Reasons Why Bridging Is How Compute Folks Think About Networks In the bad old days, IP routing was a low performance and high cost solution since L2 switching was done in hardware Vendors still charge extra for L3 licenses on the same box:  BGP costs even more money than OSPF No good routing protocol stack on the host L3 considered complex to configure and troubleshoot compared to (mythical) L2 which was plug-and-play April 21, 2016 5
  6. 6. Open Networking April 21, 2016 6  Merchant switching silicon can perform Bridging, and IP routing at same performance and price  Open Networking solutions such as Cumulus Linux offer routing at same price point as bridging
  7. 7. Routing Protocol Suite on Host Many high quality open source routing suites now available for the host  Cumulus Quagga  BIRD  ExaBGP Also commercial offerings are coming in:  Windows Server 2012 April 21, 2016 7
  8. 8. Simplifying Routing Solutions such as OSPF Unnumbered, BGP Unnumbered coupled with automation dramatically simplify routing April 21, 2016 8
  9. 9. April 21, 2016 9 •OK, So How Are Modern Applications Designed If We Have a Pure L3 Network ?
  10. 10. Google Cloud Platform Kubernetes Demystifying Networking Webinar 5/19/2016 Kelsey Hightower <> Staff Developer Advocate @kelseyhightower
  11. 11. Google Cloud Platform Google has been developing and using containers to manage our applications for over 12 years. Images by Connie Zhou
  12. 12. Google Cloud Platform Everything at Google runs in containers: • Gmail, Web Search, Maps, ... • MapReduce, batch, ... • GFS, Colossus, ... • Even Google’s Cloud Platform: our VMs run in containers!
  13. 13. Google Cloud Platform But it’s all so different! • Deployment • Management, monitoring • Isolation (very complicated!) • Updates • Discovery • Scaling, replication, sets A fundamentally different way of managing applications requires different tooling and abstractions Images by Connie Zhou
  14. 14. Google Cloud Platform Kubernetes Greek for “Helmsman”; also the root of the words “governor” and “cybernetic” • Manages container clusters • Inspired and informed by Google’s experiences and internal systems • Supports multiple cloud and bare-metal environments • Supports multiple container runtimes • 100% Open source, written in Go Manage applications, not machines
  15. 15. Google Cloud Platform kubelet UI kubeletCLI API users master nodes The 10000 foot view etcd kubelet scheduler controllers apiserver
  16. 16. Google Cloud Platform UI All you really care about API Container Cluster
  17. 17. Google Cloud Platform Workload Portability
  18. 18. Google Cloud Platform Goal: Avoid vendor lock-in Runs in many environments, including “bare metal” and “your laptop” The API and the implementation are 100% open The whole system is modular and replaceable Workload Portability
  19. 19. Google Cloud Platform Goal: Write once, run anywhere* Don’t force apps to know about concepts that are cloud-provider-specific Examples of this: ● Network model ● Ingress ● Service load-balancers ● PersistentVolumes * approximately Workload Portability
  20. 20. Google Cloud Platform Goal: Avoid coupling Don’t force apps to know about concepts that are Kubernetes-specific Examples of this: ● Namespaces ● Services / DNS Workload Portability
  21. 21. Google Cloud Platform Pods
  22. 22. Google Cloud Platform Pods Small group of containers & volumes Tightly coupled The atom of scheduling & placement Shared namespace • share IP address & localhost • share IPC, etc. Managed lifecycle • bound to a node, restart in place • can die, cannot be reborn with same ID Example: data puller & web server Consumers Content Manager File Puller Web Server Volume Pod
  23. 23. Google Cloud Platform Volumes Very similar to Docker’s concept Pod scoped storage Support many types of volume plugins • Empty dir (and tmpfs) • Host path • Git repository • GCE Persistent Disk • AWS Elastic Block Store • Azure File Storage • iSCSI • Flocker • NFS • GlusterFS • Ceph File and RBD • Cinder • FibreChannel • Secret, ConfigMap, DownwardAPI • Flex (exec a binary)
  24. 24. Google Cloud Platform ReplicationControllers
  25. 25. Google Cloud Platform ReplicationControllers A simple control loop Runs out-of-process wrt API server Has 1 job: ensure N copies of a pod • if too few, start some • if too many, kill some • grouped by a selector Cleanly layered on top of the core • all access is by public APIs Replicated pods are fungible • No implied order or identity ReplicationController - name = “my-rc” - selector = {“App”: “MyApp”} - podTemplate = { ... } - replicas = 4 API Server How many? 3 Start 1 more OK How many? 4
  26. 26. Google Cloud Platform Deployments
  27. 27. Google Cloud Platform Deployments Goal: updates-as-a-service • Rolling update is imperative, client-side Deployment manages replica changes for you • stable object name • updates are configurable, done server-side • kubectl edit or kubectl apply Aggregates stats Can have multiple updates in flight Status: BETA in Kubernetes v1.2 ...
  28. 28. Google Cloud Platform Namespaces
  29. 29. Google Cloud Platform Namespaces Problem: I have too much stuff! • name collisions in the API • poor isolation between users • don’t want to expose things like Secrets Solution: Slice up the cluster • create new Namespaces as needed • per-user, per-app, per-department, etc. • part of the API - NOT private machines • most API objects are namespaced • part of the REST URL path • Namespaces are just another API object • One-step cleanup - delete the Namespace • Obvious hook for policy enforcement (e.g. quota)
  30. 30. Google Cloud Platform Networking
  31. 31. Google Cloud Platform Docker networking
  32. 32. Google Cloud Platform NAT NAT NAT NAT NAT Docker networking
  33. 33. Google Cloud Platform Host ports A: 3306 B: 80 9376 11878SNAT SNAT C: 8000
  34. 34. Google Cloud Platform Host ports A: 3306 B: 80 9376 11878SNAT SNAT C: 8000
  35. 35. Google Cloud Platform Kubernetes networking IPs are routable • vs docker default private IP Pods can reach each other without NAT • even across nodes No brokering of port numbers • too complex, why bother? This is a fundamental requirement • can be L3 routed • can be underlayed (cloud) • can be overlayed (SDN)
  36. 36. Google Cloud Platform Kubernetes networking
  37. 37. Google Cloud Platform Network Isolation
  38. 38. Google Cloud Platform Network Isolation Describe the DAG of your app, enforce it in the network Restrict Pod-to-Pod traffic or across Namespaces Designed by the network SIG • implementations for Calico, OpenShift, Romana, OpenContrail (so far) Status: Alpha in v1.2, expect beta in v1.3
  39. 39. Google Cloud Platform Network Plugins
  40. 40. Google Cloud Platform Network Plugins Introduced in Kubernetes v1.0 • VERY experimental Uses CNI (CoreOS) in v1.1 • Simple exec interface • Not using Docker libnetwork • but can defer to Docker for networking Cluster admins can customize their installs • DHCP, MACVLAN, Flannel, custom net Plugin Plugin Plugin
  41. 41. Google Cloud Platform Services
  42. 42. Google Cloud Platform Services A group of pods that work together • grouped by a selector Defines access policy • “load balanced” or “headless” Gets a stable virtual IP and port • sometimes called the service portal • also a DNS name VIP is managed by kube-proxy • watches all services • updates iptables when backends change Hides complexity - ideal for non-native apps Client Virtual IP
  43. 43. Google Cloud Platform iptables kube-proxy iptables kube-proxy apiserver Node X
  44. 44. Google Cloud Platform iptables kube-proxy apiserver Node X watch services & endpoints iptables kube-proxy
  45. 45. Google Cloud Platform iptables kube-proxy apiserver Node X kubectl run ... watch iptables kube-proxy
  46. 46. Google Cloud Platform iptables kube-proxy apiserver Node X schedule watch iptables kube-proxy
  47. 47. Google Cloud Platform iptables kube-proxy apiserver Node X watch kubectl expose ... iptables kube-proxy
  48. 48. Google Cloud Platform iptables kube-proxy apiserver Node X new service! update iptables kube-proxy
  49. 49. Google Cloud Platform iptables kube-proxy apiserver Node X watch configure iptables kube-proxy
  50. 50. Google Cloud Platform iptables kube-proxy apiserver Node X watch VIP iptables kube-proxy
  51. 51. Google Cloud Platform iptables kube-proxy apiserver Node X new endpoints! update VIP iptables kube-proxy
  52. 52. Google Cloud Platform iptables kube-proxy apiserver Node X VIP watch configure iptables kube-proxy
  53. 53. Google Cloud Platform iptables kube-proxy apiserver Node X VIP watch iptables kube-proxy
  54. 54. Google Cloud Platform iptables kube-proxy apiserver Node X VIP watch Client iptables kube-proxy
  55. 55. Google Cloud Platform iptables kube-proxy apiserver Node X VIP watch Client iptables kube-proxy
  56. 56. Google Cloud Platform iptables kube-proxy apiserver Node X VIP watch Client iptables kube-proxy
  57. 57. Google Cloud Platform iptables kube-proxy apiserver Node X VIP watch Client iptables kube-proxy
  58. 58. Google Cloud Platform External Services Services IPs are only available inside the cluster Need to receive traffic from “the outside world” Builtin: Service “type” • NodePort: expose on a port on every node • LoadBalancer: provision a cloud load-balancer DiY load-balancer solutions • socat (for nodePort remapping) • haproxy • nginx
  59. 59. Google Cloud Platform Ingress (L7) Many apps are HTTP/HTTPS Services are L3/L4 (IP + port) Ingress maps incoming traffic to backend services • by HTTP host headers • by HTTP URL paths HAProxy, NGINX, AWS and GCE implementations in progress Now with SSL! Status: BETA in Kubernetes v1.2 Client URL Map
  60. 60. Google Cloud Platform DNS Run SkyDNS as a pod in the cluster • kube2sky bridges Kubernetes API -> SkyDNS • Tell kubelets about it (static service IP) Strictly optional, but practically required • LOTS of things depend on it • Probably will become more integrated Or plug in your own!
  61. 61. Google Cloud Platform Community Top 0.01% of all Github projects 1200+ external projects based on k8s Companies Contributing Companies Using 800+ unique contributors
  62. 62. 6262 Kubernetes is Open Code: Chat: Twitter: @kubernetesio open community open design open source open to ideas
  63. 63. April 21, 2016 63 •Tea For The Tillerman •Routing On the Host
  64. 64. Completing the Kubernetes Puzzle How do we announce the routes required by Kubernetes across pods ? Run a routing protocol on the host April 21, 2016 64
  65. 65. April 21, 2016 What If Host Configuration Could Be As Simple As… neighbor eth0 redistribute connected
  66. 66. What Cumulus Quagga Will Be in 3.0 router bgp 65534  bgp router-id  neighbor eth0 interface remote-as external  redistribute connected April 21, 2016 66
  67. 67. More Details Two ways to use BGP on the host:  Using Dynamic Neighbors  Using BGP Unnumbered Use of ASN:  All servers use the same ASN April 21, 2016 67
  68. 68. BGP on Host: Dynamic Neighbors ToR is configured with subnet from which clients can connect Clients initiate connection Rest of operation is regular BGP  bgp listen range peer-group SERVER bgp listen- limit 8 April 21, 2016 68
  69. 69. BGP on Host: Unnumbered Configuration Connection to servers is not bridged, but p2p  Pure L3 Interface-based configuration with remote-as external April 21, 2016 69
  70. 70. And for the OSPF Afficianados interface eth0  ip ospf area router ospf  ospf router-id  area stub no-summary  passive interface docker0 April 21, 2016 70
  71. 71. Seat Belts With Routing On The Host Hosts are always stub networks, never transit  Hosts are in separate area from rest of network with OSPF Announce only default route to host Accept only specified prefixes from host April 21, 2016 71
  72. 72. Customers Running Cumulus Quagga on the Host All container-based apps  One mid-size customer is running with OSPF  One small-mid size customer is running with BGP Unnumbered  One mid-to-large size customer is running with BGP  300+ Openstack cluster with VxLAN and Routing To The Host  Multiple other customers in PoC or pre-production April 21, 2016 72
  73. 73. April 21, 2016 73 •Summing Up
  74. 74. Building Pure L3 Fabrics is real  Networks, Compute and Applications are showing how to do this  Standards-based, robust, scalable design Kubernetes provides a framework for deploying containerized networks  Its what Google pushed out after years of internal deployment High quality open source routing stacks available for hosts April 21, 2016 74
  75. 75. © 2016 Cumulus Networks. Cumulus Networks, the Cumulus Networks Logo, and Cumulus Linux are trademarks or registered trademarks of Cumulus Networks, Inc. or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. The registered trademark Linux® is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide basis. Thank You! 75April 21, 2016