SlideShare a Scribd company logo
1 of 33
Download to read offline
@bob_cotton@bob_cotton
Reveal Your Deepest Kubernetes Metrics
KubeCon EU 2018
@bob_cotton
About Me
● CTO & Co-Founder - FreshTracks.io - A CA Accelerator Incubation
● bob@freshtracks.io
● @bob_cotton
● Father, Fly Fisher & Avid homebrewer
@bob_cotton
Agenda
● Determining Important Metrics
○ Four Golden Signals
○ USE Method
○ RED Method
● Sources of metrics
○ Node
○ kubelet and containers
○ Kubernetes API
○ etcd
○ Derived metrics (kube-state-metrics)
● Metric Aggregation through the Kubernetes Hierarchy
@bob_cotton@bob_cotton
What are the Important Metrics?
@bob_cotton
Four Golden Signals
● Latency
○ The time it takes to service a request.
● Errors
○ The rate of requests that fail, either explicitly, implicitly, or by policy
● Traffic
○ A measure of how much demand is being placed on your system
● Saturation
○ How "full" your service is.
@bob_cotton
USE Method
● Introduced by Brendan Gregg for reasoning about system resources
○ Resources are all physical server functional components (CPUs, disks, busses…)
● Utilization
○ The average time that the resource was busy servicing work
● Saturation
○ The degree to which the resource has extra work which it can't service, often queued
● Errors
○ The count of error events
@bob_cotton
RED Method
● Introduced by Tom Wilkie
○ A subset of the Four Golden Signals for measuring Services
● Rate
○ The number of requests per second
● Errors
○ The number of errors per second
● Duration
○ The length of time required to service the request
@bob_cotton
USE is for Resources
RED is for Services
Kubernetes Has Both!
@bob_cotton@bob_cotton
Sources of Metrics in Kubernetes
@bob_cotton
Node
Node Metrics from node_exporter
● node_exporter installed a DaemonSet
○ One instance per node
● Standard Host Metrics
○ Load Average
○ CPU
○ Memory
○ Disk
○ Network
○ Many others
● ~1000 Unique series in a typical node
/metrics
node_exporter
@bob_cotton
USE for Node CPU
Utilization node_cpu
sum(rate(node_cpu{mode!=”idle”,mode!=”iowait”,
mode!~”^(?:guest.*)$”}[5m])) BY (instance)
Saturation node_load1
sum(node_load1) by (node) / count(node_cpu{mode="system"})by
(node) * 100
Errors N/A Not exposed by node_exporter
@bob_cotton
USE for Node Memory
Utilization node_memory_MemAvailable
node_memory_MemTotal
kube_node_status_capacity_memory_bytes
kube_node_status_allocatable_memory_bytes
1 - sum(node_memory_MemAvailable) by (node)/
sum(node_memory_MemTotal) by (node)
1- sum(kube_node_status_allocatable_memory_bytes)
by (exported_node) /
sum(kube_node_status_capacity_memory_bytes) by
(exported_node)
Saturation Don’t go into swap!
Errors node_edac_correctable_errors_total
node_edac_uncorrectable_errors_total
node_edac_csrow_correctable_errors_total
node_edac_csrow_uncorrectable_errors_total
Only available on some systems
@bob_cotton
Container Metrics from cAdvisor
● cAdvisor is embedded into the kubelet, so we scrape the kubelet to get container metrics
● These are the so-called Kubernetes “core” metrics
● For each container on the node:
○ CPU Usage (user and system) and time throttled
○ Filesystem read/writes/limits
○ Memory usage and limits
○ Network transmit/receive/dropped
Node
/metrics
kubelet
cAdvisor
node_exporter
@bob_cotton
USE for Container CPU
Utilization container_cpu_usage_seconds_total sum(rate(
container_cpu_usage_seconds_total[5m]))
by (container_name)
Saturation container_cpu_cfs_throttled_seconds_total sum(rate(
container_cpu_cfs_throttled_seconds_total[5m]) by
(container_name)
Errors N/A
@bob_cotton
USE for Container Memory
Utilization container_memory_usage_bytes
container_memory_working_set_bytes
sum(container_memory_working_set_bytes{name!~"POD"})
by (name)
Saturation Ratio of:
container_memory_working_set_bytes /
kube_pod_container_resource_limits_m
emory_bytes
sum(container_memory_working_set_bytes)
by (container_name) /
sum(label_join(kube_pod_container_resource_limits_memory_b
ytes, "container_name", "", "container"))
by (container_name)
Errors container_memory_failcnt -- Number
of memory usage hits limits.
container_memory_failures_total --
Cumulative count of memory
allocation failures.
sum(rate(
container_memory_failures_total
{type="pgmajfault"}[5m]))
by (container_name)
@bob_cotton
Kubernetes Metrics from the K8s API Server
● Metrics about the performance of the K8s API Server
○ Performance of controller work queues
○ Request Rates and Latencies
○ Etcd helper cache work queues and cache performance
○ General process status (File Descriptors/Memory/CPU Seconds)
○ Golang status (GC/Memory/Threads)
Node /metrics
kubelet
cAdvisor
node_exporter
Any other Pod
API Server
@bob_cotton
RED for Kubernetes API Server
Rate apiserver_request_count sum(rate(apiserver_request_count[5m])) by (verb)
Errors apiserver_request_count rate(apiserver_request_count{code=~"^(?:5..)$"}[5m
]) / rate(apiserver_request_count[5m])
Duration apiserver_request_latencies_bucket histogram_quantile(0.9,
rate(apiserver_request_latencies_bucket[5m])) / 1e+06
@bob_cotton
K8s Derived Metrics from kube-state-metrics
● Counts and metadata about many K8s types
○ Counts of many “nouns”
○ Resource Limits
○ Container states
■ ready/restarts/running/terminated/waiting
● *_labels series carries labels
○ Series has a constant value of 1
○ Join to other series for on-the-fly labeling using left_join
@bob_cotton
Etcd Metrics from etcd
● Etcd is “master of all truth” within a K8s cluster
○ Leader existence and leader change rate
○ Proposals committed/applied/pending/failed
○ Disk write performance
○ Inbound gRPC stats
■ etcd_http_received_total
■ etcd_http_failed_total
■ etcd_http_successful_duration_seconds_bucket
○ Intra-cluster gRPC stats
■ etcd_network_member_round_trip_time_seconds_bucket
■ ...
@bob_cotton
Core Metrics Aggregation
Namespace
Deployment
Pod
Container
● K8s clusters form a hierarchy
● We can aggregate the “core” metrics to any level
● This allows for some interesting monitoring opportunities
● Using Prometheus “recording rules” aggregate the core metrics
at every level
● Insights into all levels of your Kubernetes cluster
● This also applies to any custom application metric
@bob_cotton
Thanks
@bob_cotton
Resources
● USE Method
● RED Method
● Deep Dive into Kubernetes Metrics
● kube-state-metrics
@bob_cotton@bob_cotton
Scheduling and Autoscaling
i.e. The Metrics Pipeline
@bob_cotton
The New “Metrics Server”
● Replaces Heapster
● Standard (versioned and auth) API aggregated into the K8s API Server
● In “beta” in K8s 1.8
● Used by the scheduler and (eventually) the Horizontal Pod Autoscaler
● A stripped-down version of Heapster
● Reports on “core” metrics (CPU/Memory/Network) gathered from cAdvisor
● For internal to K8s use only.
● Pluggable for custom metrics
@bob_cotton
@bob_cotton
Feeding the Horizontal Pod Autoscaler
● Before the metrics server the HPA utilized Heapster for it’s Core metrics
○ This will be the metrics-server going forward
● API Adapter will bridge to third party monitoring system
○ e.g. Prometheus
@bob_cotton@bob_cotton
Labels, Re-Label and Recording Rules
Oh My...
@bob_cotton
Label/Value Based Data Model
● Graphite/StatsD
○ apache.192-168-5-1.home.200.http_request_total
○ apache.192-168-5-1.home.500.http_request_total
○ apache.192-168-5-1.about.200.http_request_total
● Prometheus
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”,
status=”200”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”,
status=”500”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”,
status=”200”}
● Selecting Series
○ *.*.home.200.*.http_requests_total
○ http_requests_total{status=”200”, path=”/home”}
@bob_cotton
Kubernetes Labels
● Kubernetes gives us labels on all the things
● Our scrape targets live in the context of the K8s labels
○ This comes from service discovery
● We want to enhance the scraped metric labels with K8s labels
● This is why we need relabel rules in Prometheus
@bob_cotton
Prometheus
K8s API Server
TSDB
Kublet
(cAdvisor)
node-exporter
kube_state_metrics
App containers
other exporters
node_exporter
App containers
Kublet
(cAdvisor)
Service Discovery
@bob_cotton
K8s API Server
TSDB
Scrape Target
Service Discovery
Prometheus
0="{__address__ 300.196.17.41}"
1="{__meta_kubernetes_namespace default}"
2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}"
3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}"
4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}"
5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu
request for container prometheus-configmap-reload; cpu request for container data-sidecar}"
6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}"
7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}"
8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}"
9="{__meta_kubernetes_pod_host_ip 172.20.42.119}"
10="{__meta_kubernetes_pod_ip 100.96.17.41}"
11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}"
12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}"
13="{__meta_kubernetes_pod_label_run data-sidecar}"
14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}"
15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}"
16="{__meta_kubernetes_pod_ready false}"
17="{__metrics_path__ /metrics}"
18="{__scheme__ http}"
19="{job ftio-data-sidecar-calc}"
<relabel_config>
{__address__ 300.196.17.41:8077}
{__scheme__ http}
{__metrics_path__ /metrics}
{job ftio-data-sidecar-calc}
{kubernetes_namespace default}
{container_name prometheus-configmap-reload}
http_requests_total{region=”us-east”,
az=”us-east-1”, instance_type=”m2.xlarge”,
instance=”i-3582k8”, hostname=”host1”} = 5439
http_requests_total{region=”us-east”,
az=”us-east-1”,
instance_type=”m2.xlarge”,
instance=”i-3582k8”,
hostname=”host1”,
instance=”300.196.17.41:8077”,
job=”ftio-data-sidecar-calc”,
kubernetes_namespace=”default”,
container_name=”prometheus-configmap-reload”,
} = 5439
<metric_relabel_config>
@bob_cotton
Recording Rules
Create a new series, derived from one or more existing series
# The name of the time series to output to. Must be a valid metric name.
record: <string>
# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>
# Labels to add or overwrite before storing the result.
labels:
[ <labelname>: <labelvalue> ]
@bob_cotton
Recording Rules
Create a new series, derived from one or more existing series
record: pod_name:cpu_usage_seconds:rate5m
expr: sum(rate(container_cpu_usage_seconds_total{pod_name=~"^(?:.+)$"}[5m]))
BY (pod_name)
labels:
ft_target: "true"

More Related Content

What's hot

Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streamsconfluent
 
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021StreamNative
 
Kubernetes Requests and Limits
Kubernetes Requests and LimitsKubernetes Requests and Limits
Kubernetes Requests and LimitsAhmed AbouZaid
 
War Stories: DIY Kafka
War Stories: DIY KafkaWar Stories: DIY Kafka
War Stories: DIY Kafkaconfluent
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...HostedbyConfluent
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Waysmalltown
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchInfluxData
 
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...NETWAYS
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Frank Kelly
 
Serverless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineServerless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineShu-Jeng Hsieh
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksRuslan Meshenberg
 
2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOps2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOpscraigbox
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!DoiT International
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 
Kubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidenceKubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidenceOmer Barel
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!Jonathan Katz
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsdatamantra
 
Where is my cache architectural patterns for caching microservices by example
Where is my cache  architectural patterns for caching microservices by exampleWhere is my cache  architectural patterns for caching microservices by example
Where is my cache architectural patterns for caching microservices by exampleRafał Leszko
 

What's hot (20)

Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
 
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
 
Kubernetes Requests and Limits
Kubernetes Requests and LimitsKubernetes Requests and Limits
Kubernetes Requests and Limits
 
War Stories: DIY Kafka
War Stories: DIY KafkaWar Stories: DIY Kafka
War Stories: DIY Kafka
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
 
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...
 
Serverless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineServerless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipeline
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
 
2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOps2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOps
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
Kubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidenceKubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidence
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Where is my cache architectural patterns for caching microservices by example
Where is my cache  architectural patterns for caching microservices by exampleWhere is my cache  architectural patterns for caching microservices by example
Where is my cache architectural patterns for caching microservices by example
 

Similar to 20180503 kube con eu kubernetes metrics deep dive

Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Idan Atias
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 reviewManageIQ
 
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기Jinsu Moon
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes IntroductionMiloš Zubal
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
 
Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowTatiana Al-Chueyr
 
Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018Accumulo Summit
 
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverHow to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverEDB
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For OperatorsKevin Brockhoff
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on KubernetesJoerg Henning
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthScyllaDB
 
Kubernetes #2 monitoring
Kubernetes #2   monitoring Kubernetes #2   monitoring
Kubernetes #2 monitoring Terry Cho
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsCloud Native Day Tel Aviv
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKafkaZone
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusTobias Schmidt
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle ManagementDoKC
 

Similar to 20180503 kube con eu kubernetes metrics deep dive (20)

Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
 
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
 
Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018
 
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverHow to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL server
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
Kubernetes #2 monitoring
Kubernetes #2   monitoring Kubernetes #2   monitoring
Kubernetes #2 monitoring
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applications
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 

Recently uploaded

TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMMarco Wobben
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...ferisulianta.com
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsGain Insights
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseThinkInnovation
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1bengalurutug
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe321k
 
Paul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfPaul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfdcphostmaster
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfJasonBoboKyaw
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfmxlos0
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentAggregage
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingMarketingTrips
 
PPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfggggPPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfggggbhadratanusenapati1
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media PlatformsMahmoud Yasser
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxjkmrshll88
 

Recently uploaded (20)

TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IM
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded Analytics
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe
 
Paul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfPaul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdf
 
Target_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110millionTarget_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110million
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdf
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdf
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product Development
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân Marketing
 
PPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfggggPPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfgggg
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptx
 

20180503 kube con eu kubernetes metrics deep dive

  • 1. @bob_cotton@bob_cotton Reveal Your Deepest Kubernetes Metrics KubeCon EU 2018
  • 2. @bob_cotton About Me ● CTO & Co-Founder - FreshTracks.io - A CA Accelerator Incubation ● bob@freshtracks.io ● @bob_cotton ● Father, Fly Fisher & Avid homebrewer
  • 3. @bob_cotton Agenda ● Determining Important Metrics ○ Four Golden Signals ○ USE Method ○ RED Method ● Sources of metrics ○ Node ○ kubelet and containers ○ Kubernetes API ○ etcd ○ Derived metrics (kube-state-metrics) ● Metric Aggregation through the Kubernetes Hierarchy
  • 5. @bob_cotton Four Golden Signals ● Latency ○ The time it takes to service a request. ● Errors ○ The rate of requests that fail, either explicitly, implicitly, or by policy ● Traffic ○ A measure of how much demand is being placed on your system ● Saturation ○ How "full" your service is.
  • 6. @bob_cotton USE Method ● Introduced by Brendan Gregg for reasoning about system resources ○ Resources are all physical server functional components (CPUs, disks, busses…) ● Utilization ○ The average time that the resource was busy servicing work ● Saturation ○ The degree to which the resource has extra work which it can't service, often queued ● Errors ○ The count of error events
  • 7. @bob_cotton RED Method ● Introduced by Tom Wilkie ○ A subset of the Four Golden Signals for measuring Services ● Rate ○ The number of requests per second ● Errors ○ The number of errors per second ● Duration ○ The length of time required to service the request
  • 8. @bob_cotton USE is for Resources RED is for Services Kubernetes Has Both!
  • 10. @bob_cotton Node Node Metrics from node_exporter ● node_exporter installed a DaemonSet ○ One instance per node ● Standard Host Metrics ○ Load Average ○ CPU ○ Memory ○ Disk ○ Network ○ Many others ● ~1000 Unique series in a typical node /metrics node_exporter
  • 11. @bob_cotton USE for Node CPU Utilization node_cpu sum(rate(node_cpu{mode!=”idle”,mode!=”iowait”, mode!~”^(?:guest.*)$”}[5m])) BY (instance) Saturation node_load1 sum(node_load1) by (node) / count(node_cpu{mode="system"})by (node) * 100 Errors N/A Not exposed by node_exporter
  • 12. @bob_cotton USE for Node Memory Utilization node_memory_MemAvailable node_memory_MemTotal kube_node_status_capacity_memory_bytes kube_node_status_allocatable_memory_bytes 1 - sum(node_memory_MemAvailable) by (node)/ sum(node_memory_MemTotal) by (node) 1- sum(kube_node_status_allocatable_memory_bytes) by (exported_node) / sum(kube_node_status_capacity_memory_bytes) by (exported_node) Saturation Don’t go into swap! Errors node_edac_correctable_errors_total node_edac_uncorrectable_errors_total node_edac_csrow_correctable_errors_total node_edac_csrow_uncorrectable_errors_total Only available on some systems
  • 13. @bob_cotton Container Metrics from cAdvisor ● cAdvisor is embedded into the kubelet, so we scrape the kubelet to get container metrics ● These are the so-called Kubernetes “core” metrics ● For each container on the node: ○ CPU Usage (user and system) and time throttled ○ Filesystem read/writes/limits ○ Memory usage and limits ○ Network transmit/receive/dropped Node /metrics kubelet cAdvisor node_exporter
  • 14. @bob_cotton USE for Container CPU Utilization container_cpu_usage_seconds_total sum(rate( container_cpu_usage_seconds_total[5m])) by (container_name) Saturation container_cpu_cfs_throttled_seconds_total sum(rate( container_cpu_cfs_throttled_seconds_total[5m]) by (container_name) Errors N/A
  • 15. @bob_cotton USE for Container Memory Utilization container_memory_usage_bytes container_memory_working_set_bytes sum(container_memory_working_set_bytes{name!~"POD"}) by (name) Saturation Ratio of: container_memory_working_set_bytes / kube_pod_container_resource_limits_m emory_bytes sum(container_memory_working_set_bytes) by (container_name) / sum(label_join(kube_pod_container_resource_limits_memory_b ytes, "container_name", "", "container")) by (container_name) Errors container_memory_failcnt -- Number of memory usage hits limits. container_memory_failures_total -- Cumulative count of memory allocation failures. sum(rate( container_memory_failures_total {type="pgmajfault"}[5m])) by (container_name)
  • 16. @bob_cotton Kubernetes Metrics from the K8s API Server ● Metrics about the performance of the K8s API Server ○ Performance of controller work queues ○ Request Rates and Latencies ○ Etcd helper cache work queues and cache performance ○ General process status (File Descriptors/Memory/CPU Seconds) ○ Golang status (GC/Memory/Threads) Node /metrics kubelet cAdvisor node_exporter Any other Pod API Server
  • 17. @bob_cotton RED for Kubernetes API Server Rate apiserver_request_count sum(rate(apiserver_request_count[5m])) by (verb) Errors apiserver_request_count rate(apiserver_request_count{code=~"^(?:5..)$"}[5m ]) / rate(apiserver_request_count[5m]) Duration apiserver_request_latencies_bucket histogram_quantile(0.9, rate(apiserver_request_latencies_bucket[5m])) / 1e+06
  • 18. @bob_cotton K8s Derived Metrics from kube-state-metrics ● Counts and metadata about many K8s types ○ Counts of many “nouns” ○ Resource Limits ○ Container states ■ ready/restarts/running/terminated/waiting ● *_labels series carries labels ○ Series has a constant value of 1 ○ Join to other series for on-the-fly labeling using left_join
  • 19. @bob_cotton Etcd Metrics from etcd ● Etcd is “master of all truth” within a K8s cluster ○ Leader existence and leader change rate ○ Proposals committed/applied/pending/failed ○ Disk write performance ○ Inbound gRPC stats ■ etcd_http_received_total ■ etcd_http_failed_total ■ etcd_http_successful_duration_seconds_bucket ○ Intra-cluster gRPC stats ■ etcd_network_member_round_trip_time_seconds_bucket ■ ...
  • 20. @bob_cotton Core Metrics Aggregation Namespace Deployment Pod Container ● K8s clusters form a hierarchy ● We can aggregate the “core” metrics to any level ● This allows for some interesting monitoring opportunities ● Using Prometheus “recording rules” aggregate the core metrics at every level ● Insights into all levels of your Kubernetes cluster ● This also applies to any custom application metric
  • 22. @bob_cotton Resources ● USE Method ● RED Method ● Deep Dive into Kubernetes Metrics ● kube-state-metrics
  • 24. @bob_cotton The New “Metrics Server” ● Replaces Heapster ● Standard (versioned and auth) API aggregated into the K8s API Server ● In “beta” in K8s 1.8 ● Used by the scheduler and (eventually) the Horizontal Pod Autoscaler ● A stripped-down version of Heapster ● Reports on “core” metrics (CPU/Memory/Network) gathered from cAdvisor ● For internal to K8s use only. ● Pluggable for custom metrics
  • 26. @bob_cotton Feeding the Horizontal Pod Autoscaler ● Before the metrics server the HPA utilized Heapster for it’s Core metrics ○ This will be the metrics-server going forward ● API Adapter will bridge to third party monitoring system ○ e.g. Prometheus
  • 28. @bob_cotton Label/Value Based Data Model ● Graphite/StatsD ○ apache.192-168-5-1.home.200.http_request_total ○ apache.192-168-5-1.home.500.http_request_total ○ apache.192-168-5-1.about.200.http_request_total ● Prometheus ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”200”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”500”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”, status=”200”} ● Selecting Series ○ *.*.home.200.*.http_requests_total ○ http_requests_total{status=”200”, path=”/home”}
  • 29. @bob_cotton Kubernetes Labels ● Kubernetes gives us labels on all the things ● Our scrape targets live in the context of the K8s labels ○ This comes from service discovery ● We want to enhance the scraped metric labels with K8s labels ● This is why we need relabel rules in Prometheus
  • 30. @bob_cotton Prometheus K8s API Server TSDB Kublet (cAdvisor) node-exporter kube_state_metrics App containers other exporters node_exporter App containers Kublet (cAdvisor) Service Discovery
  • 31. @bob_cotton K8s API Server TSDB Scrape Target Service Discovery Prometheus 0="{__address__ 300.196.17.41}" 1="{__meta_kubernetes_namespace default}" 2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}" 3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}" 4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}" 5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu request for container prometheus-configmap-reload; cpu request for container data-sidecar}" 6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}" 7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}" 8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}" 9="{__meta_kubernetes_pod_host_ip 172.20.42.119}" 10="{__meta_kubernetes_pod_ip 100.96.17.41}" 11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}" 12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}" 13="{__meta_kubernetes_pod_label_run data-sidecar}" 14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}" 15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}" 16="{__meta_kubernetes_pod_ready false}" 17="{__metrics_path__ /metrics}" 18="{__scheme__ http}" 19="{job ftio-data-sidecar-calc}" <relabel_config> {__address__ 300.196.17.41:8077} {__scheme__ http} {__metrics_path__ /metrics} {job ftio-data-sidecar-calc} {kubernetes_namespace default} {container_name prometheus-configmap-reload} http_requests_total{region=”us-east”, az=”us-east-1”, instance_type=”m2.xlarge”, instance=”i-3582k8”, hostname=”host1”} = 5439 http_requests_total{region=”us-east”, az=”us-east-1”, instance_type=”m2.xlarge”, instance=”i-3582k8”, hostname=”host1”, instance=”300.196.17.41:8077”, job=”ftio-data-sidecar-calc”, kubernetes_namespace=”default”, container_name=”prometheus-configmap-reload”, } = 5439 <metric_relabel_config>
  • 32. @bob_cotton Recording Rules Create a new series, derived from one or more existing series # The name of the time series to output to. Must be a valid metric name. record: <string> # The PromQL expression to evaluate. Every evaluation cycle this is # evaluated at the current time, and the result recorded as a new set of # time series with the metric name as given by 'record'. expr: <string> # Labels to add or overwrite before storing the result. labels: [ <labelname>: <labelvalue> ]
  • 33. @bob_cotton Recording Rules Create a new series, derived from one or more existing series record: pod_name:cpu_usage_seconds:rate5m expr: sum(rate(container_cpu_usage_seconds_total{pod_name=~"^(?:.+)$"}[5m])) BY (pod_name) labels: ft_target: "true"