SlideShare a Scribd company logo
1 of 40
Download to read offline
Journey of
Kubernetes Scaling
Code Mania 111 @ Siam University
June 10, 2018
Journey of Kubernetes Scaling
● Setthasarun Prasanpun (Beer)
● Former PHP developer
● DevOps Engineer @ Opsta
#whoami
Journey of Kubernetes Scaling
● Jirayut Nimsaeng (Dear)
● Interested in Cloud and
Open Source
● Agile Practitioner with
DevOps Driven
● CEO and Founder Opsta
#whoami
Journey of Kubernetes Scaling
● What is Docker and Kubernetes?
● Batch Processing
● Solution to scale Batch Processing
● Optimization
● Benchmark
● Future
Agenda
Journey of Kubernetes Scaling
What is Docker Container?
Journey of Kubernetes Scaling
One Server
Node
Container
Journey of Kubernetes Scaling
Multiple Servers
Node 2
Container
Node 1 Node 3
???
Journey of Kubernetes Scaling
Kubernetes Automatic Bin Packing
Node 2Node 1 Node 3
Container
Service A
Container
Service A
Container
Service B
kube-scheduler
Journey of Kubernetes Scaling
● Self-healing
● Service discovery & load balancing
● Automated rollouts and rollbacks
● Secret and configuration management
● Storage orchestration
● Batch execution
● Horizontal manual/auto-scaling
Some more features on Kubernetes
Journey of Kubernetes Scaling
Batch Processing
User
User
User
User
Queue
Worker
Worker
Worker
Result
Job
Job
Job
Job
Consume
Consume
Consume
Journey of Kubernetes Scaling
Challenge
User
User
User
User
Queue
Worker
Worker
Worker
DB
Job
Job
Job
Job
API
Consume
Consume
Consume
Push
Journey of Kubernetes Scaling
First Design on AWS
User
User
User
User
SQS
Worker
Worker
Worker
DB
API
Journey of Kubernetes Scaling
Problem
User
User
User
User
SQS
Worker
Worker
Worker
DB
API
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
User
60,000
QUEUES!!!
Journey of Kubernetes Scaling
Solution with Elastic Beanstalk
API
SQS
Elastic Beanstalk Container
Auto Scaling Instance Group
EC2 Sqsd
Worker
EC2 Sqsd
Worker
EC2 Sqsd
Worker
Set scale condition by CPU utilization
Journey of Kubernetes Scaling
Problems
- CPU utilization not a good metric for autoscale condition
- 1 EC2 contain only 1 Worker container
- EC2 spec not fit with worker require, waste resources.
- Very slow to scale up, Autoscaling isn't really intended for
bursting.
Journey of Kubernetes Scaling
Kubernetes Solution
User
User
User
User
SQS
Worker
Worker
Worker
DB
API
Journey of Kubernetes Scaling
Solution with Kubernetes
SQS
WORKER
WORKER
WORKER
Node1
Node2
Node3
Kubernetes
Cluster
Journey of Kubernetes Scaling
Scale Pod with Kubernetes
SQS
WORKER
WORKER
WORKER
WORKER
WORKER
WORKER
Node1
Node2
Node3
Kubernetes
Cluster
Journey of Kubernetes Scaling
Scale Node with Kubernetes
SQS
WORKER
WORKER
WORKER
WORKER
WORKER
WORKER
WORKER
WORKER
Node1
Node2
Node3
Node4
Kubernetes
Cluster
Journey of Kubernetes Scaling
What need to be done
● Change code not to depend on Sqsd
● Build Kubernetes Cluster on AWS
● Find solution to automated scale pods and nodes
Journey of Kubernetes Scaling
Scale Pod with kube-sqs-autoscaler
● https://github.com/Wattpad/kube-sqs-autoscaler
● Pod autoscaler based on queue size in AWS SQS
● Periodically retrieves the number of messages in SQS
and scales pods accordingly with configuration
○ --scale-down-cool-down=30s
--scale-up-cool-down=5m
--scale-up-messages=100
--scale-down-messages=10
--max-pods=5
--min-pods=1
Journey of Kubernetes Scaling
SQS Autoscaling Pods (1)
SQS
WORKER
WORKER
WORKER
Node1
Node2
Node3
SQS
Autoscale
10 QUEUES
Kubernetes
Cluster
Journey of Kubernetes Scaling
SQS Autoscaling Pods (2)
SQS
WORKER
WORKER
WORKER
Node1
Node2
Node3
SQS
Autoscale
WORKER
Kubernetes
Cluster
5 QUEUES
Journey of Kubernetes Scaling
SQS Autoscaling Pods (3)
SQS
WORKER
WORKER
WORKER
Node1
Node2
Node3
SQS
Autoscale
WORKER
WORKER
Kubernetes
Cluster
0 QUEUES
Journey of Kubernetes Scaling
Scale Node with OpenAI
● https://github.com/openai/kubernetes-ec2-autoscaler
● Work with AWS Autoscaling Group to scale instance up
and down
● Scale node up by checking pod if pending status and no
free capacity node left
● Scale node down by checking CPU idle
Journey of Kubernetes Scaling
Journey of Kubernetes Scaling
Scale Node with OpenAI
SQS
WORKER
WORKER
WORKER
Node1
Node2
Node3
Kubernetes
ClusterEC2
Autoscaler
Auto Scaling Instance Group
PENDING
WORKER
WORKER
WORKER
Journey of Kubernetes Scaling
Scale Node with OpenAI
SQS
WORKER
WORKER
WORKER
Node1
Node2
Node3
Kubernetes
ClusterEC2
Autoscaler
Auto Scaling Instance Group
WORKER
Node4
WORKER
WORKER
WORKER
Journey of Kubernetes Scaling
Optimization
Journey of Kubernetes Scaling
Enhance kube-sqs-autoscale
● Scale 1 Pod at a time is too slow!
● So we improve kube-sqs-autoscale code to scale pod by
ratio between SQS and pod
○ --scale-by-ratio
--queue-per-pod-ratio=100
--scale-down-cool-down=30s
--scale-up-cool-down=5m
--max-pods=5
--min-pods=1
Journey of Kubernetes Scaling
Move from OpenAI to autoscaler
● https://github.com/kubernetes/autoscaler
● OpenAI is lack of development since developer move from
AWS to Azure
● OpenAI is not support multiple instance groups
● Autoscaler is more maturity since it is one of the
Kubernetes component
Journey of Kubernetes Scaling
Worker parallel optimization
- Worker consume only 1 job at a time.
- CPU using less than 15% but Memory going to ~35% per
worker on node, Not good for us.
- We improved our worker to consume and process multiple
jobs simultaneously (configurable setting).
- After some trials, Worker can do 5 concurrent jobs with
same processing time using more CPU and a bit more of
Memory.
Journey of Kubernetes Scaling
Worker CPU optimization
- Our worker using Tensorflow installed via Pip
- Tensorflow notice about library wasn't compiled to use
AVX and SSE4.1 instructions, but these are available on
machine. Pip version not build for any cpu instructions
- So, We build Tensorflow with all CPU instructions
available on EC2 (t2.medium) machine.
- Result is job processed about 35% Faster!!!
Journey of Kubernetes Scaling
Benchmark
Journey of Kubernetes Scaling
Benchmark questions
● How to do load test?
○ Python script 5000 reqs (200 ccu x 25 reqs/u)
within 1 mins
● What is the most optimize instance size with cost
effective?
Journey of Kubernetes Scaling
Benchmark Result Graph
t2.medium win
@1570 queues/minute
Journey of Kubernetes Scaling
Benchmark result
● Worker scaling speed:
○ EB 5-10 mins per worker instance
○ K8S <2 mins (Node available, use free node)
<5 mins (Node not available, spin up new)
Journey of Kubernetes Scaling
Conclusions
● K8s is flexible for batch processing job
● K8s has many components for autoscale
● K8s help us to optimize resource with cost effective
● K8s can finished 60,000 queues in 10 mins
Journey of Kubernetes Scaling
Future
● Use Kubernetes with AWS GPU Instance
● Change Queue
○ RabbitMQ
○ Kafka
● Optimize cost with AWS Spot Instance
Journey of Kubernetes Scaling
Q/A

More Related Content

What's hot

How we can do Multi-Tenancy on Kubernetes
How we can do Multi-Tenancy on KubernetesHow we can do Multi-Tenancy on Kubernetes
How we can do Multi-Tenancy on KubernetesOpsta
 
Accelerate your business and reduce cost with OpenStack
Accelerate your business and reduce cost with OpenStackAccelerate your business and reduce cost with OpenStack
Accelerate your business and reduce cost with OpenStackOpsta
 
Beyond OpenStack | OpenStack in Real Life
Beyond OpenStack | OpenStack in Real LifeBeyond OpenStack | OpenStack in Real Life
Beyond OpenStack | OpenStack in Real LifeOpsta
 
Kubernetes - A Rising Hero
Kubernetes - A Rising HeroKubernetes - A Rising Hero
Kubernetes - A Rising HeroHuynh Thai Bao
 
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...Puppet
 
Introduction to Kubernetes and Google Container Engine (GKE)
Introduction to Kubernetes and Google Container Engine (GKE)Introduction to Kubernetes and Google Container Engine (GKE)
Introduction to Kubernetes and Google Container Engine (GKE)Opsta
 
16. Cncf meetup-docker
16. Cncf meetup-docker16. Cncf meetup-docker
16. Cncf meetup-dockerJuraj Hantak
 
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)Kublr
 
GlueCon kubernetes & container engine
GlueCon kubernetes & container engineGlueCon kubernetes & container engine
GlueCon kubernetes & container enginebrendandburns
 
GPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkGPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkRed Hat Developers
 
Setting up CI/CD pipeline with Kubernetes and Kublr step-by-step
Setting up CI/CD pipeline with Kubernetes and Kublr step-by-stepSetting up CI/CD pipeline with Kubernetes and Kublr step-by-step
Setting up CI/CD pipeline with Kubernetes and Kublr step-by-stepOleg Chunikhin
 
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCDDevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCDDevOps_Fest
 
Kubernetes-native or not? When should you ditch your traditional CI/CD server...
Kubernetes-native or not? When should you ditch your traditional CI/CD server...Kubernetes-native or not? When should you ditch your traditional CI/CD server...
Kubernetes-native or not? When should you ditch your traditional CI/CD server...Red Hat Developers
 
Building CI/CD Pipelines with Jenkins and Kubernetes
Building CI/CD Pipelines with Jenkins and KubernetesBuilding CI/CD Pipelines with Jenkins and Kubernetes
Building CI/CD Pipelines with Jenkins and KubernetesJanakiram MSV
 
Getting started with Azure Container Service (AKS)
Getting started with Azure Container Service (AKS)Getting started with Azure Container Service (AKS)
Getting started with Azure Container Service (AKS)Janakiram MSV
 
GitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with KubernetesGitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with KubernetesVolodymyr Shynkar
 
From development to production: Deploying Java and Scala apps to kubernetes
From development to production: Deploying Java and Scala apps to kubernetesFrom development to production: Deploying Java and Scala apps to kubernetes
From development to production: Deploying Java and Scala apps to kubernetesOlanga Ochieng'
 
Quarkus: From developer joy to Kubernetes nirvana! | DevNation Tech Talk
Quarkus: From developer joy to Kubernetes nirvana! | DevNation Tech TalkQuarkus: From developer joy to Kubernetes nirvana! | DevNation Tech Talk
Quarkus: From developer joy to Kubernetes nirvana! | DevNation Tech TalkRed Hat Developers
 

What's hot (20)

How we can do Multi-Tenancy on Kubernetes
How we can do Multi-Tenancy on KubernetesHow we can do Multi-Tenancy on Kubernetes
How we can do Multi-Tenancy on Kubernetes
 
Accelerate your business and reduce cost with OpenStack
Accelerate your business and reduce cost with OpenStackAccelerate your business and reduce cost with OpenStack
Accelerate your business and reduce cost with OpenStack
 
Openshift argo cd_v1_2
Openshift argo cd_v1_2Openshift argo cd_v1_2
Openshift argo cd_v1_2
 
Beyond OpenStack | OpenStack in Real Life
Beyond OpenStack | OpenStack in Real LifeBeyond OpenStack | OpenStack in Real Life
Beyond OpenStack | OpenStack in Real Life
 
Kubernetes - A Rising Hero
Kubernetes - A Rising HeroKubernetes - A Rising Hero
Kubernetes - A Rising Hero
 
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
 
Introduction to Kubernetes and Google Container Engine (GKE)
Introduction to Kubernetes and Google Container Engine (GKE)Introduction to Kubernetes and Google Container Engine (GKE)
Introduction to Kubernetes and Google Container Engine (GKE)
 
16. Cncf meetup-docker
16. Cncf meetup-docker16. Cncf meetup-docker
16. Cncf meetup-docker
 
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
 
GlueCon kubernetes & container engine
GlueCon kubernetes & container engineGlueCon kubernetes & container engine
GlueCon kubernetes & container engine
 
GPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkGPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech Talk
 
Setting up CI/CD pipeline with Kubernetes and Kublr step-by-step
Setting up CI/CD pipeline with Kubernetes and Kublr step-by-stepSetting up CI/CD pipeline with Kubernetes and Kublr step-by-step
Setting up CI/CD pipeline with Kubernetes and Kublr step-by-step
 
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCDDevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
 
Kubernetes-native or not? When should you ditch your traditional CI/CD server...
Kubernetes-native or not? When should you ditch your traditional CI/CD server...Kubernetes-native or not? When should you ditch your traditional CI/CD server...
Kubernetes-native or not? When should you ditch your traditional CI/CD server...
 
Building CI/CD Pipelines with Jenkins and Kubernetes
Building CI/CD Pipelines with Jenkins and KubernetesBuilding CI/CD Pipelines with Jenkins and Kubernetes
Building CI/CD Pipelines with Jenkins and Kubernetes
 
Getting started with Azure Container Service (AKS)
Getting started with Azure Container Service (AKS)Getting started with Azure Container Service (AKS)
Getting started with Azure Container Service (AKS)
 
Knative Intro
Knative IntroKnative Intro
Knative Intro
 
GitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with KubernetesGitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with Kubernetes
 
From development to production: Deploying Java and Scala apps to kubernetes
From development to production: Deploying Java and Scala apps to kubernetesFrom development to production: Deploying Java and Scala apps to kubernetes
From development to production: Deploying Java and Scala apps to kubernetes
 
Quarkus: From developer joy to Kubernetes nirvana! | DevNation Tech Talk
Quarkus: From developer joy to Kubernetes nirvana! | DevNation Tech TalkQuarkus: From developer joy to Kubernetes nirvana! | DevNation Tech Talk
Quarkus: From developer joy to Kubernetes nirvana! | DevNation Tech Talk
 

Similar to Journey of Kubernetes Scaling

[GS네오텍] Google Kubernetes Engine
[GS네오텍]  Google Kubernetes Engine [GS네오텍]  Google Kubernetes Engine
[GS네오텍] Google Kubernetes Engine GS Neotek
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Chris Fregly
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
Docker on Amazon ECS
Docker on Amazon ECSDocker on Amazon ECS
Docker on Amazon ECSDeepak Kumar
 
Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?Mathieu Herbert
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learningAntje Barth
 
Bootstrapping Clusters with EKS Blueprints.pptx
Bootstrapping Clusters with EKS Blueprints.pptxBootstrapping Clusters with EKS Blueprints.pptx
Bootstrapping Clusters with EKS Blueprints.pptxssuserd4e0d2
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
Persist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystemPersist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystemLibbySchulze
 
Kubernetes & Google Container Engine @ mabl
Kubernetes & Google Container Engine @ mablKubernetes & Google Container Engine @ mabl
Kubernetes & Google Container Engine @ mablJoseph Lust
 
Aws Fargate clusterless serverless
Aws Fargate clusterless serverlessAws Fargate clusterless serverless
Aws Fargate clusterless serverlessRodrigo Galba
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...NETWAYS
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 introTerry Cho
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for BeginnersDigitalOcean
 
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingKubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingTheofilos Papapanagiotou
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersJoy Qiao
 

Similar to Journey of Kubernetes Scaling (20)

[GS네오텍] Google Kubernetes Engine
[GS네오텍]  Google Kubernetes Engine [GS네오텍]  Google Kubernetes Engine
[GS네오텍] Google Kubernetes Engine
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Kubernetes intro
Kubernetes introKubernetes intro
Kubernetes intro
 
AWS ECS workshop
AWS ECS workshopAWS ECS workshop
AWS ECS workshop
 
Docker on Amazon ECS
Docker on Amazon ECSDocker on Amazon ECS
Docker on Amazon ECS
 
Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?
 
reBuy on Kubernetes
reBuy on KubernetesreBuy on Kubernetes
reBuy on Kubernetes
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Bootstrapping Clusters with EKS Blueprints.pptx
Bootstrapping Clusters with EKS Blueprints.pptxBootstrapping Clusters with EKS Blueprints.pptx
Bootstrapping Clusters with EKS Blueprints.pptx
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Persist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystemPersist your data in an ephemeral k8 ecosystem
Persist your data in an ephemeral k8 ecosystem
 
Kubernetes & Google Container Engine @ mabl
Kubernetes & Google Container Engine @ mablKubernetes & Google Container Engine @ mabl
Kubernetes & Google Container Engine @ mabl
 
Aws Fargate clusterless serverless
Aws Fargate clusterless serverlessAws Fargate clusterless serverless
Aws Fargate clusterless serverless
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
 
Swarm migration
Swarm migrationSwarm migration
Swarm migration
 
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingKubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clusters
 

More from Opsta

Deploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOpsDeploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOpsOpsta
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with BackstageOpsta
 
Kubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with DemoKubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with DemoOpsta
 
Introduction of CCE and DevCloud
Introduction of CCE and DevCloudIntroduction of CCE and DevCloud
Introduction of CCE and DevCloudOpsta
 
How to build DevSecOps Platform on Huawei Cloud
How to build DevSecOps Platform on Huawei CloudHow to build DevSecOps Platform on Huawei Cloud
How to build DevSecOps Platform on Huawei CloudOpsta
 
Make a better DevOps with GitOps
Make a better DevOps with GitOpsMake a better DevOps with GitOps
Make a better DevOps with GitOpsOpsta
 
Deploy Application on Kubernetes
Deploy Application on KubernetesDeploy Application on Kubernetes
Deploy Application on KubernetesOpsta
 
Platform Engineering
Platform EngineeringPlatform Engineering
Platform EngineeringOpsta
 
Manage Kubernetes Clusters with Cluster API and ArgoCD
Manage Kubernetes Clusters with Cluster API and ArgoCDManage Kubernetes Clusters with Cluster API and ArgoCD
Manage Kubernetes Clusters with Cluster API and ArgoCDOpsta
 
Security Process in DevSecOps
Security Process in DevSecOpsSecurity Process in DevSecOps
Security Process in DevSecOpsOpsta
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud PlatformOpsta
 
Managing traffic routing with istio and envoy workshop
Managing traffic routing with istio and envoy workshopManaging traffic routing with istio and envoy workshop
Managing traffic routing with istio and envoy workshopOpsta
 
How to pass the Google Certification Exams
How to pass the Google Certification ExamsHow to pass the Google Certification Exams
How to pass the Google Certification ExamsOpsta
 
DevOps Transformation in Technical
DevOps Transformation in TechnicalDevOps Transformation in Technical
DevOps Transformation in TechnicalOpsta
 
Performance Testing with Tsung
Performance Testing with TsungPerformance Testing with Tsung
Performance Testing with TsungOpsta
 
Modern Monitoring - SysAdminDay 2017
Modern Monitoring - SysAdminDay 2017Modern Monitoring - SysAdminDay 2017
Modern Monitoring - SysAdminDay 2017Opsta
 
OpenStack and DevOps - DevOps Meetup
OpenStack and DevOps - DevOps MeetupOpenStack and DevOps - DevOps Meetup
OpenStack and DevOps - DevOps MeetupOpsta
 
How to contribute to OpenStack
How to contribute to OpenStackHow to contribute to OpenStack
How to contribute to OpenStackOpsta
 

More from Opsta (18)

Deploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOpsDeploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOps
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
 
Kubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with DemoKubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with Demo
 
Introduction of CCE and DevCloud
Introduction of CCE and DevCloudIntroduction of CCE and DevCloud
Introduction of CCE and DevCloud
 
How to build DevSecOps Platform on Huawei Cloud
How to build DevSecOps Platform on Huawei CloudHow to build DevSecOps Platform on Huawei Cloud
How to build DevSecOps Platform on Huawei Cloud
 
Make a better DevOps with GitOps
Make a better DevOps with GitOpsMake a better DevOps with GitOps
Make a better DevOps with GitOps
 
Deploy Application on Kubernetes
Deploy Application on KubernetesDeploy Application on Kubernetes
Deploy Application on Kubernetes
 
Platform Engineering
Platform EngineeringPlatform Engineering
Platform Engineering
 
Manage Kubernetes Clusters with Cluster API and ArgoCD
Manage Kubernetes Clusters with Cluster API and ArgoCDManage Kubernetes Clusters with Cluster API and ArgoCD
Manage Kubernetes Clusters with Cluster API and ArgoCD
 
Security Process in DevSecOps
Security Process in DevSecOpsSecurity Process in DevSecOps
Security Process in DevSecOps
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
 
Managing traffic routing with istio and envoy workshop
Managing traffic routing with istio and envoy workshopManaging traffic routing with istio and envoy workshop
Managing traffic routing with istio and envoy workshop
 
How to pass the Google Certification Exams
How to pass the Google Certification ExamsHow to pass the Google Certification Exams
How to pass the Google Certification Exams
 
DevOps Transformation in Technical
DevOps Transformation in TechnicalDevOps Transformation in Technical
DevOps Transformation in Technical
 
Performance Testing with Tsung
Performance Testing with TsungPerformance Testing with Tsung
Performance Testing with Tsung
 
Modern Monitoring - SysAdminDay 2017
Modern Monitoring - SysAdminDay 2017Modern Monitoring - SysAdminDay 2017
Modern Monitoring - SysAdminDay 2017
 
OpenStack and DevOps - DevOps Meetup
OpenStack and DevOps - DevOps MeetupOpenStack and DevOps - DevOps Meetup
OpenStack and DevOps - DevOps Meetup
 
How to contribute to OpenStack
How to contribute to OpenStackHow to contribute to OpenStack
How to contribute to OpenStack
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Journey of Kubernetes Scaling

  • 1. Journey of Kubernetes Scaling Code Mania 111 @ Siam University June 10, 2018
  • 2. Journey of Kubernetes Scaling ● Setthasarun Prasanpun (Beer) ● Former PHP developer ● DevOps Engineer @ Opsta #whoami
  • 3. Journey of Kubernetes Scaling ● Jirayut Nimsaeng (Dear) ● Interested in Cloud and Open Source ● Agile Practitioner with DevOps Driven ● CEO and Founder Opsta #whoami
  • 4. Journey of Kubernetes Scaling ● What is Docker and Kubernetes? ● Batch Processing ● Solution to scale Batch Processing ● Optimization ● Benchmark ● Future Agenda
  • 5. Journey of Kubernetes Scaling What is Docker Container?
  • 6. Journey of Kubernetes Scaling One Server Node Container
  • 7. Journey of Kubernetes Scaling Multiple Servers Node 2 Container Node 1 Node 3 ???
  • 8. Journey of Kubernetes Scaling Kubernetes Automatic Bin Packing Node 2Node 1 Node 3 Container Service A Container Service A Container Service B kube-scheduler
  • 9. Journey of Kubernetes Scaling ● Self-healing ● Service discovery & load balancing ● Automated rollouts and rollbacks ● Secret and configuration management ● Storage orchestration ● Batch execution ● Horizontal manual/auto-scaling Some more features on Kubernetes
  • 10. Journey of Kubernetes Scaling Batch Processing User User User User Queue Worker Worker Worker Result Job Job Job Job Consume Consume Consume
  • 11. Journey of Kubernetes Scaling Challenge User User User User Queue Worker Worker Worker DB Job Job Job Job API Consume Consume Consume Push
  • 12. Journey of Kubernetes Scaling First Design on AWS User User User User SQS Worker Worker Worker DB API
  • 13. Journey of Kubernetes Scaling Problem User User User User SQS Worker Worker Worker DB API User User User User User User User User User User User User User User User User User User User User User User User User User 60,000 QUEUES!!!
  • 14. Journey of Kubernetes Scaling Solution with Elastic Beanstalk API SQS Elastic Beanstalk Container Auto Scaling Instance Group EC2 Sqsd Worker EC2 Sqsd Worker EC2 Sqsd Worker Set scale condition by CPU utilization
  • 15. Journey of Kubernetes Scaling Problems - CPU utilization not a good metric for autoscale condition - 1 EC2 contain only 1 Worker container - EC2 spec not fit with worker require, waste resources. - Very slow to scale up, Autoscaling isn't really intended for bursting.
  • 16. Journey of Kubernetes Scaling Kubernetes Solution User User User User SQS Worker Worker Worker DB API
  • 17. Journey of Kubernetes Scaling Solution with Kubernetes SQS WORKER WORKER WORKER Node1 Node2 Node3 Kubernetes Cluster
  • 18. Journey of Kubernetes Scaling Scale Pod with Kubernetes SQS WORKER WORKER WORKER WORKER WORKER WORKER Node1 Node2 Node3 Kubernetes Cluster
  • 19. Journey of Kubernetes Scaling Scale Node with Kubernetes SQS WORKER WORKER WORKER WORKER WORKER WORKER WORKER WORKER Node1 Node2 Node3 Node4 Kubernetes Cluster
  • 20. Journey of Kubernetes Scaling What need to be done ● Change code not to depend on Sqsd ● Build Kubernetes Cluster on AWS ● Find solution to automated scale pods and nodes
  • 21. Journey of Kubernetes Scaling Scale Pod with kube-sqs-autoscaler ● https://github.com/Wattpad/kube-sqs-autoscaler ● Pod autoscaler based on queue size in AWS SQS ● Periodically retrieves the number of messages in SQS and scales pods accordingly with configuration ○ --scale-down-cool-down=30s --scale-up-cool-down=5m --scale-up-messages=100 --scale-down-messages=10 --max-pods=5 --min-pods=1
  • 22. Journey of Kubernetes Scaling SQS Autoscaling Pods (1) SQS WORKER WORKER WORKER Node1 Node2 Node3 SQS Autoscale 10 QUEUES Kubernetes Cluster
  • 23. Journey of Kubernetes Scaling SQS Autoscaling Pods (2) SQS WORKER WORKER WORKER Node1 Node2 Node3 SQS Autoscale WORKER Kubernetes Cluster 5 QUEUES
  • 24. Journey of Kubernetes Scaling SQS Autoscaling Pods (3) SQS WORKER WORKER WORKER Node1 Node2 Node3 SQS Autoscale WORKER WORKER Kubernetes Cluster 0 QUEUES
  • 25. Journey of Kubernetes Scaling Scale Node with OpenAI ● https://github.com/openai/kubernetes-ec2-autoscaler ● Work with AWS Autoscaling Group to scale instance up and down ● Scale node up by checking pod if pending status and no free capacity node left ● Scale node down by checking CPU idle
  • 27. Journey of Kubernetes Scaling Scale Node with OpenAI SQS WORKER WORKER WORKER Node1 Node2 Node3 Kubernetes ClusterEC2 Autoscaler Auto Scaling Instance Group PENDING WORKER WORKER WORKER
  • 28. Journey of Kubernetes Scaling Scale Node with OpenAI SQS WORKER WORKER WORKER Node1 Node2 Node3 Kubernetes ClusterEC2 Autoscaler Auto Scaling Instance Group WORKER Node4 WORKER WORKER WORKER
  • 29. Journey of Kubernetes Scaling Optimization
  • 30. Journey of Kubernetes Scaling Enhance kube-sqs-autoscale ● Scale 1 Pod at a time is too slow! ● So we improve kube-sqs-autoscale code to scale pod by ratio between SQS and pod ○ --scale-by-ratio --queue-per-pod-ratio=100 --scale-down-cool-down=30s --scale-up-cool-down=5m --max-pods=5 --min-pods=1
  • 31. Journey of Kubernetes Scaling Move from OpenAI to autoscaler ● https://github.com/kubernetes/autoscaler ● OpenAI is lack of development since developer move from AWS to Azure ● OpenAI is not support multiple instance groups ● Autoscaler is more maturity since it is one of the Kubernetes component
  • 32. Journey of Kubernetes Scaling Worker parallel optimization - Worker consume only 1 job at a time. - CPU using less than 15% but Memory going to ~35% per worker on node, Not good for us. - We improved our worker to consume and process multiple jobs simultaneously (configurable setting). - After some trials, Worker can do 5 concurrent jobs with same processing time using more CPU and a bit more of Memory.
  • 33. Journey of Kubernetes Scaling Worker CPU optimization - Our worker using Tensorflow installed via Pip - Tensorflow notice about library wasn't compiled to use AVX and SSE4.1 instructions, but these are available on machine. Pip version not build for any cpu instructions - So, We build Tensorflow with all CPU instructions available on EC2 (t2.medium) machine. - Result is job processed about 35% Faster!!!
  • 34. Journey of Kubernetes Scaling Benchmark
  • 35. Journey of Kubernetes Scaling Benchmark questions ● How to do load test? ○ Python script 5000 reqs (200 ccu x 25 reqs/u) within 1 mins ● What is the most optimize instance size with cost effective?
  • 36. Journey of Kubernetes Scaling Benchmark Result Graph t2.medium win @1570 queues/minute
  • 37. Journey of Kubernetes Scaling Benchmark result ● Worker scaling speed: ○ EB 5-10 mins per worker instance ○ K8S <2 mins (Node available, use free node) <5 mins (Node not available, spin up new)
  • 38. Journey of Kubernetes Scaling Conclusions ● K8s is flexible for batch processing job ● K8s has many components for autoscale ● K8s help us to optimize resource with cost effective ● K8s can finished 60,000 queues in 10 mins
  • 39. Journey of Kubernetes Scaling Future ● Use Kubernetes with AWS GPU Instance ● Change Queue ○ RabbitMQ ○ Kafka ● Optimize cost with AWS Spot Instance
  • 40. Journey of Kubernetes Scaling Q/A