SlideShare a Scribd company logo
1 of 63
Download to read offline
Infrastructure & System
Monitoring
using Prometheus
Marco Pas
Philips Lighting
Software geek, hands on
Developer/Architect/DevOps Engineer
@marcopas
Some stuff about me...
● Mostly doing cloud related stuff
○ Java, Groovy, Scala, Spring Boot, IOT, AWS, Terraform, Infrastructure
● Enjoying the good things
● Chef leuke dingen doen == “trying out cool and new stuff”
● Currently involved in a big IOT project
● Wannabe chef, movie & Netflix addict
Agenda
● Monitoring
○ Introducing you to a Scary Movie
● Prometheus overview (demo’s)
○ Running Prometheus
○ Gathering host metrics
○ Introducing Grafana
○ Monitoring Docker containers
○ Alerting
○ Instrumenting your own code
○ Service Discovery (Consul) integration
..Quick Inventory..
I am going to introduce
you to some bad movies
Commonality
between
these movies?
Monitoring
Our scary movie “The Happy Developer”
● Lets push out features
● I can demo so it works :)
● It works with 1 user, so it will work with
multiple
● Don’t worry about performance we will
just scale using multiple
machines/processes
● Logging is into place
Did
anyone
notice?
Disaster Strikes
Logging
“recording to diagnose a system”
Monitoring
“observation, checking and recording”
http_requests_total{method="post",code="200"} 1027 1395066363000
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Logging != Monitoring
Vital Signs
Why Monitoring?
● Know when things go wrong
○ Detection & Alerting
● Be able to debug and gain insight
● Detect changes over time and
drive technical/business decisions
● Feed into other systems/processes
(e.g. security, automation)
What to monitor?
IT Network
Operating
System
Services
Applications
Capture
Monitoring
Information
Functional
Monitoring
Operational
Monitoring
metric data
Houston we have Storage problem!
Storage
metric data
metric data
metric data
metric data
metric data
metric data
metric data
metric data
metric data
How to store the mass amount of
metrics and also making them easy
to query?
Time Series - Database
● Time series data is a sequence of data points collected at regular intervals
over a period of time. (metrics)
○ Examples:
■ Device data
■ Weather data
■ Stock prices
■ Tide measurements
■ Solar flare tracking
● The data requires aggregation and analysis
Time Series
Database
metric data
● High write performance
● Data compaction
● Fast, easy range queries
metric name and a set of key-value pairs, also known as labels
<metric name>{<label name>=<label value>, ...} value [ timestamp ]
http_requests_total{method="post",code="200"} 1027 1395066363000
Time Series - Data format
Source:
http://db-engines.com/en/ranking/time+series+dbmshttp://db-engines.com/en/ranking/time+series+dbms
Prometheus Overview
Prometheus
Prometheus is an open-source systems monitoring and alerting toolkit originally
built at SoundCloud. It is now a standalone open source project and maintained
independently of any company.
https://prometheus.io
Implemented using
Prometheus Components
● The main Prometheus server which scrapes and stores time series data
● Client libraries for instrumenting application code
● A push gateway for supporting short-lived jobs
● Special-purpose exporters (for HAProxy, StatsD, Graphite, etc.)
● An alertmanager
● Various support tools
● WhiteBox Monitoring instead of probing [aka BlackBox Monitoring]
Prometheus Overview
List of Job Exporters
● Prometheus managed:
○ JMX
○ Node
○ Graphite
○ Blackbox
○ SNMP
○ HAProxy
○ Consul
○ Memcached
○ AWS Cloudwatch
○ InfluxDB
○ StatsD
○ ...
● Custom ones:
○ Database
○ Hardware related
○ Messaging systems
○ Storage
○ HTTP
○ APIs
○ Logging
○ …
https://prometheus.io/docs/instrumenting/exporters/
Demo Structure
Demo: Run Prometheus (native)
# file: prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
# some settings intentionally removed!!
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Code Demo
“Running Prometheus Native”
Demo: Run Prometheus using Docker
34
# file: docker-compose.yml
version: '2'
services:
prometheus:
image: prom/prometheus:latest → Using official prometheus container
volumes:
- $PWD:/etc/prometheus → Mount local directory used for config + data
ports:
- "9090:9090" → Port mapping used for this container host:container
command:
- "-config.file=/etc/prometheus/prometheus.yml" → Prometheus configuration
Code Demo
“Running Prometheus Dockerized”
Demo: Add host metrics
# file: docker-compose.yml
version: '2'
services:
prometheus: → Runnning prometheus as Docker container
image: prom/prometheus:latest → Using official prometheus container
volumes:
- $PWD:/etc/prometheus → Mount local directory used for config + data
ports:
- "9090:9090" → Port mapping used for this container host:container
command:
- "-config.file=/etc/prometheus/prometheus.yml" → Prometheus configuration
node-exporter:
image: prom/node-exporter:latest → Using node exporter as an additional container
ports:
- '9100:9100' → Port mapping used for this container host:container
38
# file: prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
# some settings intentionally removed!!
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
Code Demo
“Add host metrics”
Demo: Grafana
40
# file: docker-compose.yml
version: '2'
services:
# some code intentionally removed!!
grafana:
image: grafana/grafana:latest → Using official prometheus container
ports:
- "3000:3000" → Port mapping used for this container host:container
You get the idea :)
Code Demo
“Grafana”
Demo: Monitor Docker containers
Code Demo
“cAdvisor”
Demo: Alerting
Alerting Configuration
● Alert Rules
○ What are the settings where we
need to alert upon?
● Alert Manager
○ Where do we need to send the alert
to?
# file: alert.rules
ALERT serviceDownAlert
IF absent(((time() - container_last_seen{name="<service_name>"}) < 5))
FOR 5s
LABELS {
severity = "critical", → setting the labels so we can use them in the AlertManager
service = "backend"
}
ANNOTATIONS { → information used in the alert event
SUMMARY = "Container Instance down",
DESCRIPTION = "Container Instance is down for more than 15 sec."
}
# file: alert-manager.yml
global: → Global settings
smtp_smarthost: 'mailslurper:2500'
smtp_from: 'alertmanager@example.org'
smtp_require_tls: false
route: → Routing
receiver: mail # Fallback → Fallback is there is no match
routes:
- match:
severity: critical → Match on label!
continue: true → Continue with other receivers if there is a match
receiver: mail → Determine the receiver
- match:
severity: critical
receiver: slack
# file: alert-manager.yml (continued)
receivers:
- name: mail → mail receiver
email_configs:
- to: 'team-X+alerts@example.org'
- name: slack → slack receiver
slack_configs:
- send_resolved: true
username: 'AlertManager'
channel: '#alert'
api_url: 'THIS IS A VERY SECRET URL :)’
# file: prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "alert.rules"
# some settings intentionally removed!!
Code Demo
“Alerting -> The Alert Manager”
Instrumenting your own code!
● Counter
○ A cumulative metric that represents a single numerical value that only ever goes up
● Gauge
○ Single numerical value that can arbitrarily go up and down
● Histogram
○ Samples observations (usually things like request durations or response sizes) and counts
them in configurable buckets. It also provides a sum of all observed values
● Summary
○ Histogram + total count of observations + sum of all observed values, it calculates
configurable quantiles over a sliding time window
Available Languages
● Official
○ Go, Java or Scala, Python, Ruby
● Unofficial
○ Bash, C++, Common Lisp, Elixir, Erlang, Haskell, Lua for Nginx, Lua for Tarantool, .NET / C#,
Node.js, PHP, Rust
// Spring Boot example -> file: build.gradle
dependencies {
compile('org.springframework.boot:spring-boot-starter-web')
testCompile('org.springframework.boot:spring-boot-starter-test')
compile('io.prometheus:simpleclient_spring_boot:0.0.21') → Add dependency
}
Prometheus Client Libaries: SpringBoot Example
@EnablePrometheusEndpoint
@EnableSpringBootMetricsCollector
@RestController
@SpringBootApplication
public class DemoApplication {
public static void main(String[] args) { SpringApplication.run(DemoApplication.class, args); }
static final Counter requests = Counter.build() → create metric type counter
.name("helloworld_requests_total") → set metric name
.help("HelloWorld Total requests.").register(); → register the metric
@RequestMapping("/helloworld")
String home() {
requests.inc(); → increment the counter with 1 (helloworld_requests_total)
return "Hello World!";
}
}
Demo: Application metrics
Code Demo
“Application metrics”
Service Discovery
(Consul) Integration
Demo: Consul Integration
Service Discovery
Demo: Consul integration
Register the services with
Consul and Monitor
1
2
Code Demo
“Consul to the rescue”
That’s a wrap!
Question?
https://github.com/mpas/infrastructure-and-system-monitoring-using-prometheus
Marco Pas
Philips Lighting
Software geek, hands on
Developer/Architect/DevOps Engineer
@marcopas

More Related Content

What's hot

Prometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemPrometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemFabian Reinartz
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to PrometheusJulien Pivotto
 
End to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenEnd to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenParis Container Day
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basicsJuraj Hantak
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Brian Brazil
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?Wojciech Barczyński
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheusCeline George
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus OverviewBrian Brazil
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introductionRico Chen
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialTim Vaillancourt
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy Docker, Inc.
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with PrometheusShiao-An Yuan
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at ScaleFabian Reinartz
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with PrometheusQAware GmbH
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Brian Brazil
 

What's hot (20)

Prometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemPrometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring System
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
End to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenEnd to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max Inden
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
Grafana.pptx
Grafana.pptxGrafana.pptx
Grafana.pptx
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 

Similar to Infrastructure & System Monitoring using Prometheus

DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...NETWAYS
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Brian Brazil
 
Build reliable, traceable, distributed systems with ZeroMQ
Build reliable, traceable, distributed systems with ZeroMQBuild reliable, traceable, distributed systems with ZeroMQ
Build reliable, traceable, distributed systems with ZeroMQRobin Xiao
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production Hung Lin
 
Swarm: Native Docker Clustering
Swarm: Native Docker ClusteringSwarm: Native Docker Clustering
Swarm: Native Docker ClusteringRoyee Tager
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Codemotion
 
Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)Marco Pas
 
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...InfluxData
 
Angular Optimization Web Performance Meetup
Angular Optimization Web Performance MeetupAngular Optimization Web Performance Meetup
Angular Optimization Web Performance MeetupDavid Barreto
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...InfluxData
 
From nothing to Prometheus : one year after
From nothing to Prometheus : one year afterFrom nothing to Prometheus : one year after
From nothing to Prometheus : one year afterAntoine Leroyer
 
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Amazon Web Services
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209mffiedler
 
Capistrano deploy Magento project in an efficient way
Capistrano deploy Magento project in an efficient wayCapistrano deploy Magento project in an efficient way
Capistrano deploy Magento project in an efficient waySylvain Rayé
 
Devops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftDevops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftYaniv cohen
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools Yulia Shcherbachova
 
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereNew Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereRodrique Heron
 

Similar to Infrastructure & System Monitoring using Prometheus (20)

DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheus
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
Build reliable, traceable, distributed systems with ZeroMQ
Build reliable, traceable, distributed systems with ZeroMQBuild reliable, traceable, distributed systems with ZeroMQ
Build reliable, traceable, distributed systems with ZeroMQ
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production
 
Swarm: Native Docker Clustering
Swarm: Native Docker ClusteringSwarm: Native Docker Clustering
Swarm: Native Docker Clustering
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
 
Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)
 
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
 
System monitoring
System monitoringSystem monitoring
System monitoring
 
Angular Optimization Web Performance Meetup
Angular Optimization Web Performance MeetupAngular Optimization Web Performance Meetup
Angular Optimization Web Performance Meetup
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
 
From nothing to Prometheus : one year after
From nothing to Prometheus : one year afterFrom nothing to Prometheus : one year after
From nothing to Prometheus : one year after
 
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 
Capistrano deploy Magento project in an efficient way
Capistrano deploy Magento project in an efficient wayCapistrano deploy Magento project in an efficient way
Capistrano deploy Magento project in an efficient way
 
Devops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftDevops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShift
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools
 
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereNew Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Infrastructure & System Monitoring using Prometheus

  • 1. Infrastructure & System Monitoring using Prometheus Marco Pas Philips Lighting Software geek, hands on Developer/Architect/DevOps Engineer @marcopas
  • 2. Some stuff about me... ● Mostly doing cloud related stuff ○ Java, Groovy, Scala, Spring Boot, IOT, AWS, Terraform, Infrastructure ● Enjoying the good things ● Chef leuke dingen doen == “trying out cool and new stuff” ● Currently involved in a big IOT project ● Wannabe chef, movie & Netflix addict
  • 3. Agenda ● Monitoring ○ Introducing you to a Scary Movie ● Prometheus overview (demo’s) ○ Running Prometheus ○ Gathering host metrics ○ Introducing Grafana ○ Monitoring Docker containers ○ Alerting ○ Instrumenting your own code ○ Service Discovery (Consul) integration
  • 5. I am going to introduce you to some bad movies
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 13.
  • 14. Our scary movie “The Happy Developer” ● Lets push out features ● I can demo so it works :) ● It works with 1 user, so it will work with multiple ● Don’t worry about performance we will just scale using multiple machines/processes ● Logging is into place
  • 16. Logging “recording to diagnose a system” Monitoring “observation, checking and recording” http_requests_total{method="post",code="200"} 1027 1395066363000 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 Logging != Monitoring
  • 18. Why Monitoring? ● Know when things go wrong ○ Detection & Alerting ● Be able to debug and gain insight ● Detect changes over time and drive technical/business decisions ● Feed into other systems/processes (e.g. security, automation)
  • 19. What to monitor? IT Network Operating System Services Applications Capture Monitoring Information Functional Monitoring Operational Monitoring metric data
  • 20. Houston we have Storage problem! Storage metric data metric data metric data metric data metric data metric data metric data metric data metric data How to store the mass amount of metrics and also making them easy to query?
  • 21. Time Series - Database ● Time series data is a sequence of data points collected at regular intervals over a period of time. (metrics) ○ Examples: ■ Device data ■ Weather data ■ Stock prices ■ Tide measurements ■ Solar flare tracking ● The data requires aggregation and analysis Time Series Database metric data ● High write performance ● Data compaction ● Fast, easy range queries
  • 22. metric name and a set of key-value pairs, also known as labels <metric name>{<label name>=<label value>, ...} value [ timestamp ] http_requests_total{method="post",code="200"} 1027 1395066363000 Time Series - Data format
  • 25. Prometheus Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It is now a standalone open source project and maintained independently of any company. https://prometheus.io Implemented using
  • 26. Prometheus Components ● The main Prometheus server which scrapes and stores time series data ● Client libraries for instrumenting application code ● A push gateway for supporting short-lived jobs ● Special-purpose exporters (for HAProxy, StatsD, Graphite, etc.) ● An alertmanager ● Various support tools ● WhiteBox Monitoring instead of probing [aka BlackBox Monitoring]
  • 28. List of Job Exporters ● Prometheus managed: ○ JMX ○ Node ○ Graphite ○ Blackbox ○ SNMP ○ HAProxy ○ Consul ○ Memcached ○ AWS Cloudwatch ○ InfluxDB ○ StatsD ○ ... ● Custom ones: ○ Database ○ Hardware related ○ Messaging systems ○ Storage ○ HTTP ○ APIs ○ Logging ○ … https://prometheus.io/docs/instrumenting/exporters/
  • 31. # file: prometheus.yml global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. # some settings intentionally removed!! # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']
  • 33. Demo: Run Prometheus using Docker
  • 34. 34 # file: docker-compose.yml version: '2' services: prometheus: image: prom/prometheus:latest → Using official prometheus container volumes: - $PWD:/etc/prometheus → Mount local directory used for config + data ports: - "9090:9090" → Port mapping used for this container host:container command: - "-config.file=/etc/prometheus/prometheus.yml" → Prometheus configuration
  • 36. Demo: Add host metrics
  • 37. # file: docker-compose.yml version: '2' services: prometheus: → Runnning prometheus as Docker container image: prom/prometheus:latest → Using official prometheus container volumes: - $PWD:/etc/prometheus → Mount local directory used for config + data ports: - "9090:9090" → Port mapping used for this container host:container command: - "-config.file=/etc/prometheus/prometheus.yml" → Prometheus configuration node-exporter: image: prom/node-exporter:latest → Using node exporter as an additional container ports: - '9100:9100' → Port mapping used for this container host:container
  • 38. 38 # file: prometheus.yml global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. # some settings intentionally removed!! # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'node-exporter' static_configs: - targets: ['node-exporter:9100']
  • 39. Code Demo “Add host metrics”
  • 41. # file: docker-compose.yml version: '2' services: # some code intentionally removed!! grafana: image: grafana/grafana:latest → Using official prometheus container ports: - "3000:3000" → Port mapping used for this container host:container You get the idea :)
  • 43. Demo: Monitor Docker containers
  • 46. Alerting Configuration ● Alert Rules ○ What are the settings where we need to alert upon? ● Alert Manager ○ Where do we need to send the alert to?
  • 47. # file: alert.rules ALERT serviceDownAlert IF absent(((time() - container_last_seen{name="<service_name>"}) < 5)) FOR 5s LABELS { severity = "critical", → setting the labels so we can use them in the AlertManager service = "backend" } ANNOTATIONS { → information used in the alert event SUMMARY = "Container Instance down", DESCRIPTION = "Container Instance is down for more than 15 sec." }
  • 48. # file: alert-manager.yml global: → Global settings smtp_smarthost: 'mailslurper:2500' smtp_from: 'alertmanager@example.org' smtp_require_tls: false route: → Routing receiver: mail # Fallback → Fallback is there is no match routes: - match: severity: critical → Match on label! continue: true → Continue with other receivers if there is a match receiver: mail → Determine the receiver - match: severity: critical receiver: slack
  • 49. # file: alert-manager.yml (continued) receivers: - name: mail → mail receiver email_configs: - to: 'team-X+alerts@example.org' - name: slack → slack receiver slack_configs: - send_resolved: true username: 'AlertManager' channel: '#alert' api_url: 'THIS IS A VERY SECRET URL :)’
  • 50. # file: prometheus.yml global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "alert.rules" # some settings intentionally removed!!
  • 51. Code Demo “Alerting -> The Alert Manager”
  • 52. Instrumenting your own code! ● Counter ○ A cumulative metric that represents a single numerical value that only ever goes up ● Gauge ○ Single numerical value that can arbitrarily go up and down ● Histogram ○ Samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values ● Summary ○ Histogram + total count of observations + sum of all observed values, it calculates configurable quantiles over a sliding time window
  • 53. Available Languages ● Official ○ Go, Java or Scala, Python, Ruby ● Unofficial ○ Bash, C++, Common Lisp, Elixir, Erlang, Haskell, Lua for Nginx, Lua for Tarantool, .NET / C#, Node.js, PHP, Rust // Spring Boot example -> file: build.gradle dependencies { compile('org.springframework.boot:spring-boot-starter-web') testCompile('org.springframework.boot:spring-boot-starter-test') compile('io.prometheus:simpleclient_spring_boot:0.0.21') → Add dependency }
  • 54. Prometheus Client Libaries: SpringBoot Example @EnablePrometheusEndpoint @EnableSpringBootMetricsCollector @RestController @SpringBootApplication public class DemoApplication { public static void main(String[] args) { SpringApplication.run(DemoApplication.class, args); } static final Counter requests = Counter.build() → create metric type counter .name("helloworld_requests_total") → set metric name .help("HelloWorld Total requests.").register(); → register the metric @RequestMapping("/helloworld") String home() { requests.inc(); → increment the counter with 1 (helloworld_requests_total) return "Hello World!"; } }
  • 60. Demo: Consul integration Register the services with Consul and Monitor 1 2
  • 61. Code Demo “Consul to the rescue”
  • 62.
  • 63. That’s a wrap! Question? https://github.com/mpas/infrastructure-and-system-monitoring-using-prometheus Marco Pas Philips Lighting Software geek, hands on Developer/Architect/DevOps Engineer @marcopas