In this talk, we will first cover, in some depth, the kinds of deployment challenges for which Cluster Federation affords elegant solutions, including:
- Resilience against data center (or more precisely availability zone) outages. You don't want your customers to experience a service outage just because your cloud provider had an "oops" in one of their availability zones, do you?
- Migrating your application from your on-premise data center to a cloud provider can be challenging. And what about rollback if it doesn't work quite as expected? Similarly, what happens when your company decides to change their preferred cloud provider? Apps don't migrate themselves, do they?
- Capacity overflow - you have on-premise data centers, but what if they're not big enough to handle large traffic spikes? Imagine if you could burst into cloud capacity when the next web mob comes knocking on your front door, credit cards in hand?
- Regulatory enforcement and auditing - imagine if you could just label your applications as requiring EU-compliant hosting, and have the deployment system make it so, despite the other constraints of capacity, performance, authorization etc?
Then we will dive into a small set of primitives that make these kinds of challenges much easier, including:
- cross-cluster load balancing - how to make sure that your customer's web request goes to the right place?
- location affinity - do your containers care whether they're on the same LAN as each other? If so, how much?
- cross-cluster scheduling - which containers should go in which data centers, on which cloud providers?
- cross-cluster service discovery - if your containers are in different data centers, on different cloud providers, how do they find each other in a reliable way?
- cross-cluster monitoring and auditing - how do you see all of these heterogenous clusters a logical whole?
Finally, we will cover some implementation details, and what the roadmap for Ubernetes looks like today.
KubeCon schedule link: http://sched.co/4WW9
9. Reason 1: High Availability
• Cloud providers have outages, yes, but...
• Has one of your application software
upgrades ever gone terribly wrong?
• How about infrastructure upgrades
(auth systems? quota? data store?)
• How about a fat-fingered config
change?
• There are several interesting variants:
• Multiple availability zones?
• Multiple cloud providers?
Cross-cluster
Load Balancer
Your
paying
customer
Cluster 1
Cluster 2
Cluster 3
10. Reason 2: Application Migration
• Migrating applications between clusters
is tedious and error-prone if done
manually
• Much like software upgrades, you
*can* script them, but (K)ubernetes
just does it quicker/safer/better.
• Now with rollback too!
• On-premise ↔ Cloud
• Amazon ↔ Google :-)
• ...
Ubernetes
UI
On-Premise Cluster In-Cloud Cluster
Migrate: On Premise→Cloud
Different Cloud Provider
11. Reason 3: Policy Enforcement
• Some data must be stored and
processed within specified political
jurisdictions, by law.
• Some software/data must be on
premise and air-gapped, by company
policy.
• Some business units get to use the
expensive gear, some don't.
• Auditing is also a big deal, so funnelling
all operations through a central control
point makes this easier.
Ubernetes
UI
U.S. Cloud Cluster E.U Cloud Cluster
On-premise Cluster
12. Reason 4: Vendor Lock-in Avoidance
• Make it easy to migrate applications
between cloud providers.
• Run the same app on multiple cloud
providers and choose the best one for
your:
• workload characteristics
• budget
• performance requirements
• availability requirements
Ubernetes
UI
Kubernetes on GCE Kubernetes on AWS
Kubernetes On-Premise
13. Reason 5: Capacity Overflow
• Make intelligent placement decisions
• Utilization
• Cost
• Performance Ubernetes
User
On Premise Cluster
Other Cloud Provider
Preferred Cloud Provider
Run my stuff
15. Provider 1
Zone A
Zone B
Federation comes with some challenges...
Provider 2
Zone C
Provider 1
Zone D
● Different bandwidth
charges/latency/through-
put/reliability
● Different service discovery
(but DNS!)
● Consolidated monitoring
& alerting
16. Cross-cluster load balancing
• Geographically aware DNS gets clients to
the "closest" healthy cluster.
• Standard Kubernetes service load
balancing within each cluster.
• New L7 LB's available soon.
• Can be extended to divert traffic away from
"healthy-but-saturated" clusters.
17. Cross-cluster service discovery
• DNS + Kubernetes cluster-local service
discovery.
• Can default to cluster-local with failover to
remote clusters.
18. Location affinity
• Strictly coupled pods/applications
• High bandwidth requirements
• Low latency requirements
• High fidelity requirements
• Cannot easily span clusters
• Loosely coupled
• Opposite of above
• Relatively easily distributed across
clusters
• Preferentially coupled
• Strongly coupled but can be
migrated piecemeal.
19. Cross-cluster monitoring and auditing...
• "Cluster per tab" might suffice for small
numbers of clusters
• Some monitoring solutions provide
stronger integration and global
summarization
21. API Compatible with Kubernetes
• Less new stuff to learn
• Can learn incrementally, as you
need new functionality.
• Analogous argument applies to
existing automation systems (PAAS
etc).
• These can be ported to
Ubernetes relatively easily.
• All Kubernetes entities are
"federatable".
Ubernetes or
Kubernetes
Client
Applications
Applications
Applications
Run my stuff
22. State and control resides in
underlying clusters
(for the most part)
• Better scalability
• Kubernetes scales with
number of nodes per
cluster (<10,000)
• Ubernetes scales with
number of clusters (~100)
• Beter fault isolation
• Kubernetes clusters fail
independently of
Ubernetes
Kubernetes Cluster Kubernetes Cluster
Ubernetes
API
APIRepl. Ctrl etc
State
API
APIRepl. Ctrl etc
State
API
APIRepl. Ctrl etc
State
23. • Drive current state -> desired state
• But per-cluster state, not per node,
per pod etc.
• Observed state is the truth
Recurring pattern in the system
Examples:
• ReplicationController
• Service
observe
diff
act
Similar Control loops to Kubernetes
24. Modularity
Loose coupling is a goal everywhere
• simpler
• composable
• extensible
Code-level plugins where possible
Multi-process where possible
Isolate risk by interchangeable parts
Examples:
• MigrationController
• Scheduler
25. Federation status & plans
Federation Lite (single cluster, multiple zones)
• In alpha Q4 2015
• Productionized ~Q1 2016
Federation Proper (multiple clusters, federated)
• Alpha Q1 2016
Google Container Engine (GKE)
• hosted Federation too
• GKE Federation Lite ~Q1-Q2 2016
PaaSes and Distros
• RedHat OpenShift, CoreOS Tectonic, RedHat Atomic...
• ... watch this space...
26. I want more!
• Requirements doc - comments welcome
• tinyurl.com/ubernetesv2
• Special interest group
• groups.google.com/forum/kubernetes-sig-federation
• quinton@google.com
• quinton_hoole@github
Kubernetes Cluster Kubernetes Cluster
Ubernetes
API
APIRepl. Ctrl etc
State
API
APIRepl. Ctrl etc
State
API
APIRepl. Ctrl etc
State