SlideShare a Scribd company logo
1 of 29
Download to read offline
Milking the most
out of thousands of
Kubernetes clusters
What to expect from the session
• Intro
• How is CFA using K8s?
• What does our
architecture look like?
• How are we
engineering around
K8s for our business?
• Q&A
Internet of Things: Why?
AT PEAK HOUR
1 sandwich every 16 seconds
1 box of nuggets every 25 seconds
1 order of waffle fries every 14 seconds
1 car through the drive thru every 22 seconds
267 total transactions
Chick-fil-A Architecture (2017)
MSGing
Web
Server
Local
AuthEdge
Cloud
Event
Fwding
Apps
…
Local Persistence/Storage
Connectivity
Analytics Management
Things
OAuth Server MQTT
Edge Tools
Chick-fil-A Architecture (Today)
MSGing
Local
Auth
Edge
Cloud
Event
Fwd
Apps
…
Local Persistence/Storage
Connectivity
Analytics Management
Things
OAuth Server MQTT Fleet
Why Containers? Why Kubernetes?
Idea Code
Production
Code
Value
Impact
Optimize for
Accelerate
North American Data Centers
Google
Cloud
AWSAzure
North American Data Centers
Google
Cloud
AWSAzure
Cloud-fil-A
Restaurant “Data Centers”
Intel: Quadcore processor, 8 GB RAM, SSD
Engineering Around K8s
• How we build and repair bare
metal clusters
• SRE Lessons Learned
• How we deploy applications to
thousands of clusters
Challenges of Bare Metal K8s clustering at scale
• Goal: #code2prod
• Simple enough for a non-
technologist to install
• Manageable remotely
• Automated device discovery
and self-clustering
• Self healing & HA
How we Bare Metal Cluster K8s at scale
Highlander Hooves Up
TOOLS
Sherlock FleetRKEImage
PROCESS
Bootstrapping Clusters
• Highlander
– Node coordination and
clustering leader election
using UDP
– Execute clustering (RKE)
– Swap KubeDNS for CoreDNS
– Base OAuth identity
negotiation
– Controller Pods (control
plane activity/Istio)
Initializing Clusters
What we considered
• Kops = love it, no bare metal
• Kubespray = slow + brittle
• kubeadmin = maybe in the future
• RKE = fairly simple, works for us
Future State?
• Stick w/ RKE, Kubeadmin, or roll our own to meet our needs
Resetting Cluster State
• Requirement: Need to be
able to re-image remotely
• Solution: Overlay FS + HAMS
– Manages wiping clusters
and restoring to base
Hooves Up
• Self-healing AWS SSM
Registration
• Free even for non-AWS
deployments
• Able to do remote
commands and patch
reporting/management
Lessons learned
• Use K8s feature set and don’t reinvent the wheel
• MVP. MVP. MVP.
• Ensure aggregated and searchable logging
• Deep health checks are a must --> Use /healthz
• Every service needs “/metrics”
endpoint
How do we deploy to our restaurants?
• Large number of
deployment targets
• Complex success/fail
criteria
• Array of application types
• What approaches did we
consider?
kubectl
/
Introducing Fleet
• Design Goals
– Simple to use / reason about
– Use declarative approach
– Support for variety of deployment
models (canary, blue/green)
– Rollout over flexible time period
– Sane rollback behaviors
– Leverage standard k8s API
– Full visibility
Fleet Ecosystem Components
• Fleet Client
– Git webhook, REST call, CLI
• Fleet Server API
– Code generation for
deployment, service,
ingress files
– Git management for cluster
repositories
– Deployment status tracking
• Atlas
– Repository of deploy-ready,
k8s compliant application
files
• Vessel
– Deployed on cluster, git
pull, kubectl apply, report
status
• Dashboards
Sample Templates
Fleet Walk Thru/Demo PLACEHOLDER
Application Configuration
HTTP POST Request
K8s config example
Atlas
Fleet Walk Thru/Demo PLACEHOLDER
Where you can find us
www.linkedin.com/in/brian-chambers
www.linkedin.com/in/calebrhurd
@brianchambers21
@calebrhurd
https://medium.com/@cfatechblog
https://github.com/chick-fil-a

More Related Content

What's hot

Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
HostedbyConfluent
 

What's hot (20)

Taking a Crawl-Walk-Run Approach to Office 365 Retention - Ottawa SPUG (no de...
Taking a Crawl-Walk-Run Approach to Office 365 Retention - Ottawa SPUG (no de...Taking a Crawl-Walk-Run Approach to Office 365 Retention - Ottawa SPUG (no de...
Taking a Crawl-Walk-Run Approach to Office 365 Retention - Ottawa SPUG (no de...
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
AWS Serverless Interface Building and Cerner's FHIR Experience (HLC401) - AWS...
AWS Serverless Interface Building and Cerner's FHIR Experience (HLC401) - AWS...AWS Serverless Interface Building and Cerner's FHIR Experience (HLC401) - AWS...
AWS Serverless Interface Building and Cerner's FHIR Experience (HLC401) - AWS...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safe
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
ClearPass Guest Overview
ClearPass Guest Overview ClearPass Guest Overview
ClearPass Guest Overview
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
jmp206 - Lotus Domino Web Services Jumpstart
jmp206 - Lotus Domino Web Services Jumpstartjmp206 - Lotus Domino Web Services Jumpstart
jmp206 - Lotus Domino Web Services Jumpstart
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Securing the LAN Best practices to secure the wired access network
Securing the LAN Best practices to secure the wired access networkSecuring the LAN Best practices to secure the wired access network
Securing the LAN Best practices to secure the wired access network
 
A Kafka journey and why migrate to Confluent Cloud?
A Kafka journey and why migrate to Confluent Cloud?A Kafka journey and why migrate to Confluent Cloud?
A Kafka journey and why migrate to Confluent Cloud?
 
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
[IGNITE2018] [BRK2495] What’s new in Microsoft Information Protection solutio...
[IGNITE2018] [BRK2495] What’s new in Microsoft Information Protection solutio...[IGNITE2018] [BRK2495] What’s new in Microsoft Information Protection solutio...
[IGNITE2018] [BRK2495] What’s new in Microsoft Information Protection solutio...
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
Cisco Live Brksec 3032 - NGFW Clustering
Cisco Live Brksec 3032 - NGFW ClusteringCisco Live Brksec 3032 - NGFW Clustering
Cisco Live Brksec 3032 - NGFW Clustering
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
 
Fiware IoT_IDAS_intro_ul20_v2
Fiware IoT_IDAS_intro_ul20_v2Fiware IoT_IDAS_intro_ul20_v2
Fiware IoT_IDAS_intro_ul20_v2
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 

Similar to Chick-fil-A: Milking the most out of thousands of kubernetes clusteres

Service-Level Objective for Serverless Applications
Service-Level Objective for Serverless ApplicationsService-Level Objective for Serverless Applications
Service-Level Objective for Serverless Applications
alekn
 

Similar to Chick-fil-A: Milking the most out of thousands of kubernetes clusteres (20)

Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKSMigrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
 
OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20
 
One Kubernetes to rule them all (ZEUS 2019 Keynote)
One Kubernetes to rule them all (ZEUS 2019 Keynote)One Kubernetes to rule them all (ZEUS 2019 Keynote)
One Kubernetes to rule them all (ZEUS 2019 Keynote)
 
Lc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangyaLc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangya
 
Service-Level Objective for Serverless Applications
Service-Level Objective for Serverless ApplicationsService-Level Objective for Serverless Applications
Service-Level Objective for Serverless Applications
 
Data(?)Ops with CircleCI
Data(?)Ops with CircleCIData(?)Ops with CircleCI
Data(?)Ops with CircleCI
 
Managing Your Cloud Assets
Managing Your Cloud AssetsManaging Your Cloud Assets
Managing Your Cloud Assets
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
Kubernetes Community Growth and Use Case
Kubernetes Community Growth and Use CaseKubernetes Community Growth and Use Case
Kubernetes Community Growth and Use Case
 
Customer Sharing: HTC - What is in AWS Cloud for me?
Customer Sharing: HTC - What is in AWS Cloud for me?Customer Sharing: HTC - What is in AWS Cloud for me?
Customer Sharing: HTC - What is in AWS Cloud for me?
 
Simplify Your Way To Expert Kubernetes Management
Simplify Your Way To Expert Kubernetes ManagementSimplify Your Way To Expert Kubernetes Management
Simplify Your Way To Expert Kubernetes Management
 
BayInfotech (BIT) ACI Portfolio
BayInfotech (BIT) ACI PortfolioBayInfotech (BIT) ACI Portfolio
BayInfotech (BIT) ACI Portfolio
 
Driving Digital Transformation With Containers And Kubernetes Complete Deck
Driving Digital Transformation With Containers And Kubernetes Complete DeckDriving Digital Transformation With Containers And Kubernetes Complete Deck
Driving Digital Transformation With Containers And Kubernetes Complete Deck
 
Database as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on KubernetesDatabase as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on Kubernetes
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Introduction of Kubernetes - Trang Nguyen
Introduction of Kubernetes - Trang NguyenIntroduction of Kubernetes - Trang Nguyen
Introduction of Kubernetes - Trang Nguyen
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data Analytics
 
oci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdfoci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdf
 
DCEU 18: Desigual Transforms the In-Store Experience with Docker Enterprise C...
DCEU 18: Desigual Transforms the In-Store Experience with Docker Enterprise C...DCEU 18: Desigual Transforms the In-Store Experience with Docker Enterprise C...
DCEU 18: Desigual Transforms the In-Store Experience with Docker Enterprise C...
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Chick-fil-A: Milking the most out of thousands of kubernetes clusteres

  • 1. Milking the most out of thousands of Kubernetes clusters
  • 2. What to expect from the session • Intro • How is CFA using K8s? • What does our architecture look like? • How are we engineering around K8s for our business? • Q&A
  • 4. AT PEAK HOUR 1 sandwich every 16 seconds 1 box of nuggets every 25 seconds 1 order of waffle fries every 14 seconds 1 car through the drive thru every 22 seconds 267 total transactions
  • 5. Chick-fil-A Architecture (2017) MSGing Web Server Local AuthEdge Cloud Event Fwding Apps … Local Persistence/Storage Connectivity Analytics Management Things OAuth Server MQTT Edge Tools
  • 6. Chick-fil-A Architecture (Today) MSGing Local Auth Edge Cloud Event Fwd Apps … Local Persistence/Storage Connectivity Analytics Management Things OAuth Server MQTT Fleet
  • 7. Why Containers? Why Kubernetes? Idea Code Production Code Value Impact Optimize for
  • 9. North American Data Centers Google Cloud AWSAzure
  • 10. North American Data Centers Google Cloud AWSAzure Cloud-fil-A
  • 11. Restaurant “Data Centers” Intel: Quadcore processor, 8 GB RAM, SSD
  • 12. Engineering Around K8s • How we build and repair bare metal clusters • SRE Lessons Learned • How we deploy applications to thousands of clusters
  • 13. Challenges of Bare Metal K8s clustering at scale • Goal: #code2prod • Simple enough for a non- technologist to install • Manageable remotely • Automated device discovery and self-clustering • Self healing & HA
  • 14. How we Bare Metal Cluster K8s at scale Highlander Hooves Up TOOLS Sherlock FleetRKEImage PROCESS
  • 15. Bootstrapping Clusters • Highlander – Node coordination and clustering leader election using UDP – Execute clustering (RKE) – Swap KubeDNS for CoreDNS – Base OAuth identity negotiation – Controller Pods (control plane activity/Istio)
  • 16. Initializing Clusters What we considered • Kops = love it, no bare metal • Kubespray = slow + brittle • kubeadmin = maybe in the future • RKE = fairly simple, works for us Future State? • Stick w/ RKE, Kubeadmin, or roll our own to meet our needs
  • 17. Resetting Cluster State • Requirement: Need to be able to re-image remotely • Solution: Overlay FS + HAMS – Manages wiping clusters and restoring to base
  • 18. Hooves Up • Self-healing AWS SSM Registration • Free even for non-AWS deployments • Able to do remote commands and patch reporting/management
  • 19. Lessons learned • Use K8s feature set and don’t reinvent the wheel • MVP. MVP. MVP. • Ensure aggregated and searchable logging • Deep health checks are a must --> Use /healthz • Every service needs “/metrics” endpoint
  • 20. How do we deploy to our restaurants? • Large number of deployment targets • Complex success/fail criteria • Array of application types • What approaches did we consider? kubectl /
  • 21. Introducing Fleet • Design Goals – Simple to use / reason about – Use declarative approach – Support for variety of deployment models (canary, blue/green) – Rollout over flexible time period – Sane rollback behaviors – Leverage standard k8s API – Full visibility
  • 22. Fleet Ecosystem Components • Fleet Client – Git webhook, REST call, CLI • Fleet Server API – Code generation for deployment, service, ingress files – Git management for cluster repositories – Deployment status tracking • Atlas – Repository of deploy-ready, k8s compliant application files • Vessel – Deployed on cluster, git pull, kubectl apply, report status • Dashboards
  • 24. Fleet Walk Thru/Demo PLACEHOLDER
  • 27. Atlas
  • 28. Fleet Walk Thru/Demo PLACEHOLDER
  • 29. Where you can find us www.linkedin.com/in/brian-chambers www.linkedin.com/in/calebrhurd @brianchambers21 @calebrhurd https://medium.com/@cfatechblog https://github.com/chick-fil-a