2. Sumo Logic confidential
• Principal Development Engineer at DellEMC
• 1st half of my career was in CGI & VMware
• 2nd half of my career has been in System Integration Testing
• Docker Captain (since 2016)
• Docker Bangalore Meetup Organizer ( 8800+ Registered
Users)
• DockerLabs Incubator ~ 1700+ Slack Members
• Freqeunt Blogger – www.collabnix.com
Ajeet Singh Raina
Twitter: @ajeetsraina
GitHub: ajeetraina
2
3. Sumo Logic confidential
Suresh Govindachetty
• Enterprise Sales Engineer at Sumo Logic
• Formerly with Citrix, HPE,Nortel
• Mostly in Presales, Networking and Security
3
4. Sumo Logic confidential
Massive shift in
monitoring
requirements from
host based
monitoring
to
“container-specific
& service-oriented
monitoring”
4
5. Sumo Logic confidential
Containers & Kubernetes: The New Reality
App
Traditional
Software
Architecture
Containerized
Architecture
Server
Orchestrated
Containerized
Architecture
5
11. Sumo Logic confidential
K8s is powerful…
but Complex !
Kubernetes
is great but
COMPLEX!
$kubectl create –f web.yaml
Current Challenges in Kubernetes Monitoring & Troubleshooting
12. Sumo Logic confidential
Current Challenges in Kubernetes Monitoring & Troubleshooting
K8s is powerful…
but Complex !
Everything,
In K8s
by design
Is
Ephemeral
13. Sumo Logic confidential
Current Challenges in Kubernetes Monitoring & Troubleshooting
K8s is powerful…
but Complex !
Cascading
Failures
- Container Communication
- Increased Dependencies
- Changing Architecture
14. Sumo Logic confidential
Current Challenges in Kubernetes Monitoring & Troubleshooting
K8s is powerful…
but Complex !
More & Noisy
Metrics(100x)
- Container Unique Metrics
- Ephemeral Data
- False Positives
15. Sumo Logic confidential
Methodology Switch
Cattle: (Container) Pet: (K8s Services)
o Named with strings of numbers
o Almost identical
o Ephemeral
o Sick: get new one
o 1 or more identical Pods
o Specific Name( kube_app, kube_name)
o Give context to container metrics
o Sick: nurse back to health
15
16. Sumo Logic confidential
Visualizing Kubernetes Objects
Service A
Namespace
Service B
Container
Pod C1
Pod C2
Pod C3
Service C
Container
Container
Pet
Cattle
16
18. Sumo Logic confidential
K8s Metrics - Monitoring Kubernetes Cluster
Node resource utilization The number of nodes Running pods
- Are number of nodes available
sufficient?
- Can they handle the entire
workload in case a node fails?
- Number of nodes available
- What you are paying for
- Discover what the cluster is
being used for.
- Network bandwidth
- Disk utilization
- CPU, and
- Memory
18
19. Sumo Logic confidential
K8s Metrics - Monitoring Pod
Kubernetes Metrics Container Metrics Application Metrics
- Developed by the application
itself and are related to the
business rules it addresses.
- For example, a database
application exposing metrics
related to an indices’ state and
statistics concerning tables and
relationships.
- Using Cadvisor and exposed by
Heapster, which queries every
node about the running
containers.
- Metrics like CPU, network, and
memory usage compared with
the maximum allowed are the
highlights.
- Monitor how a specific pod and its
deployment are being handled
- The number of instances a pod has
at the moment and how many were
expected
- How the on-progress deployment is
going (how many instances were
changed from an older version to a
new one), health checks, and some
network data available through
network services.
19
20. Sumo Logic confidential
Node Metrics from node_exporter Container Metrics from cadvisor K8s Metrics from K8s API Server
- node_exporter installed a DaemonSet
- 1 instance per node
- Also called as “K8s Core Metrics”
- Metrics about the performance of the k8s
API server
- Standard Host Metrics
- Load Average
- CPU
- Memory
- Disk
- Network
- Embedded into the Kubelet, so we
scrape the Kubelet to get container
metrics
- For each container on the node:
- CPU Usage
- Filesystem read/write/limits
- Memory usage and limits
- Network transmit/receive/dropped
- Performance of controller work queues
- Request Rates and Latencies
- ETCD helper cache work queues and
cache performance
- General process status(File
Descriptors/Memory/CPU seconds.
- GoLang Status(GC/Memory/Threads).
100 unique series in typical node
Sources of Metrics in Kubernetes
20
21. Sumo Logic confidential
Source of Metrics in Kubernetes
k8s derived kube-state-metrics Etcd Metrics from etcd
- Counts & metadata about many k8s types
- Count of many 'nouns'
- Resource limits
- Container States
- Ready/restarts/running/terminated/waiting
- Etcd is "master of all truth" within a k8s
cluster
- Leader existence and leader change
rate
- Disk Write Performance
- Inbound gRPC stats
- etcd_http_received_total
- etcd_http_failed_total
- etcd_http_successful_duration_*
21
23. Sumo Logic confidential
#1: Collect Metrics at Container Level but Alerts at Service
Level
$cat /etc/docker/daemon.json
{
"metrics-addr" : "127.0.0.1:9323",
"experimental" : true
}
24. Sumo Logic confidential
#2: Monitor Service Level Objective(SLO) per Service per Route
• Error Rate per Service per route
• Latency per Service per route
25. Sumo Logic confidential
#3: Infra Metrics: Utilization
- Resource Availability for Pods Vs Allocation
- Verify every Pod/Container has a limit (BP)
25
26. Sumo Logic confidential
#4: Always alert on High Disk Usage
26
• Monitor ALL disk volumes, including the root file system.
• Kubernetes Node Exporter provides a nice metric for tracking devices
27. Sumo Logic confidential
#5: Never ignore Kube-system
27
• Total DNS Requests - Resource Issue, Scaling Limits, Application Bug
• DNS Request Time - High Latency
• Quorum Loss in the cluster/Failure in Leader Election
• Unusual High Snapshot Duration
• Network criticality
28. Sumo Logic confidential
#6: Consistent Metadata Enrichment
Tag individual components of Kubernetes so that it can provide context for
your services
29. Sumo Logic confidential
Best Practice #6: No Better KPI than API - Track the API
Gateway for Microservices in order to
automatically detect application issues
<Image TBD>
29
30. Sumo Logic confidential
Discoverability - Infrastructure vs. Service View
- Complex
- Slow to find and troubleshoot issues
- Disconnected from the customer reality
- Simple to understand
- Quick to find and troubleshoot issues
- Tightly connected to the customer reality
Service-centric ViewpointInfrastructure-centric Viewpoint
30
31. Sumologic K8s Monitoring and Troubleshooting
• Delivers a best in class, end-to-end Kubernetes Monitoring and Troubleshooting experience.
• Open source collectors (Fluentbit, Fluentd,Prometheus, Falco)
• Visualize K8s hierarchies through Deployment, Service, Node and Namespace views
• Honeycomb visualization - quick overview of data in a visually digestible way.
• Simplified Monitoring and Troubleshooting
• Correlation of Logs, Metrics, event and Security
• Integrated security with Falco+ partner apps
33. Sumo Logic Confidential
Our Kubernetes Partner Apps - Security
App Purpose Details
SecOps Provides comprehensive monitoring and analysis solution for detecting
vulnerabilities and potential threats throughout your environment,
including hosts, containers, images and registry.
SecOps Helps you detect, investigate and remediate vulnerabilities, insecure
configurations and compliance violations across all container and
Kubernetes environments.
SecOps Provides granular security and compliance control monitoring to
DevSecOps teams throughout the cloud native application lifecycle, from
development to runtime in production.
SecOps Gives customers the ability to detect, investigate, and remediate
vulnerabilities in software artifacts across your deployment environments.
33
34. Sumo Logic Confidential
Ecosystem - Unified K8s DevOps and SecOps
Monitoring
CI/CD DevOps SecOps
circleci
codefresh
armory
harness
Kubernetes
AmazonEKS
Google
Kubernetes
Service
Azure
Kubernetes
Service
Falco
Twistlock
StackRox
aqua
Tigera
JFrog Xray
34