Time to say goodbye to your Nagios based setup

So you want to switch off ?
Time to say goodbye
to your Nagios based
setup!
@olivjan - ojan@monitoring-fr.org © 2014 - Olivier Jan - Check my Website

About me
❖ System admin and architect
❖ Co-founder of « Communauté Francophone de la Supervision Libre »
❖ Writer of the book « Nagios 3 au coeur de la supervision Open Source »
❖ Co-founder of Check my Website, a SaaS service for remote monitoring of
websites and applications (current)

Content
❖ Why switch off ? the good and maybe not so good reasons to do so !
❖ Which way to take ?
❖ Building a monitoring solution without Nagios :
❖ Tools available
❖ A personal work in progress
❖ Migrating from Nagios to this kind of solution

Some reasons to switch off…
❖ The godfather of OSS monitoring is dead as an
Open Source project ?
❖ Can’t do better with it
❖ Cool new kids out there
❖ Better « cloud » support
❖ Clear states, metrics and messages monitoring
distinction
❖ Better charting solution
❖ Near realtime monitoring
❖ Routing, aggregation, correlation…
❖ YOUR reasons ;)

Which way to take ?
❖ The « 4 mousquetaires »
❖ Naemon
❖ Icinga 2
❖ Shinken
❖ Centreon
❖ Reboot from building blocks
❖ Collect
❖ Store
❖ Visualize
❖ Alert

Tools : Collecting metrics and
messages
❖ Packetbeat (metrics & messages)
❖ Rsyslog, NX log, Syslog-ng
(messages)
❖ sFlow Toolkit, Host sFlow
❖ Logstash-forwarder (messages)
❖ Collectd (metrics)
❖ Diamond (metrics)
❖ OSquery, WMI (metrics)
❖ Network level (sFlow)
❖ System Level
❖ Application Level

Tools : External collecting
❖ End user perspective
❖ Controls done closest to the
end-user
❖ Application behavior
❖ Real User Monitoring
❖ Webpagetest
❖ Selenium
❖ PhantomasJS
❖ Boomerang
❖ Bucky

Tools : Routing metrics and messages
❖ Messages : Logstash, Flume, Fluentd
❖ Metrics : StatsD
❖ Metrics : Carbon Relay NG
One or more messages can fire an event

Tools : Databases
❖ Graphite : The most used.
❖ OpenTSDB : HBase
❖ KairosDB : Cassandra
❖ InfluxDB : The most promising ?
❖ Elasticsearch : Index database

Tools : Visualizing
metrics and messages
❖ Kibana
❖ Grafana
❖ Dashboards collection

Tools : Alerting
❖ Seyren : Alerting dashboard for
Graphite.
❖ Cabot : Get alerted when services
go down or metrics go crazy
❖ Bosun : An advanced, open-source
monitoring and alerting system
❖ Skyline : Real-time anomaly
detection system
❖ Oculus : Anomaly correlation
component of Etsy's Kale system
❖ Esper : Complex Event Processing

The French Monitoring Community
Xperience
❖ Reboot from building blocks
❖ Collect
❖ Store
❖ Visualize
❖ Alert

The French Monitoring Community
Xperience
Is it working ? What is not working ?

Collecting metrics : Collectd
❖ InfluxDB Collectd proxy
❖ In Golang like InfluxDB
❖ Temporary solution
❖ Native Collectd plugin
LoadPlugin network
<Plugin network>
# proxy address
Server "127.0.0.1" "8096"
</Plugin>
❖ PHP5-FPM metrics
❖ Nginx metrics
❖ MariaDB metrics
❖ System metrics
❖ <metricname>:<value>|<type>

Collecting messages : Rsyslog
❖ Nearly ready log consumption
❖ Native distribution package
❖ Nginx Log, MySQL slow query
log
template(name=« ls_json"
type=« list" option.json="on") {
constant(value=« {")
constant(value=""@timestamp":"") property(name="timereported" dateFormat=« rfc3339")
constant(value=« ","@version":"1")
constant(value="","message":"") property(name=« msg")
constant(value="","host":"") property(name=« hostname")
constant(value="","severity":"") property(name=« syslogseverity-text")
constant(value="","facility":"") property(name=« syslogfacility-text")
constant(value="","programname":"") property(name=« programname")
constant(value="","procid":"") property(name=« procid")
constant(value=« "}n")
}

Collecting @ network level : Packetbeat
❖ Specific agent
❖ Collect traffic for
❖ HTTP
❖ MySQL
❖ PostgreSQL
❖ Redis

Routing messages : Logstash
❖ Inputs
❖ Codecs/filters
❖ Outputs
input {
udp {
port => 10514
codec => "json"
type => "syslog"
}
}
filter {
# This replaces the host field with the host that generated the message (sysloghost)
if [sysloghost] {
mutate {
replace => [ "host", "%{sysloghost}" ]
remove_field => "sysloghost"
}
}
}
output {
elasticsearch { host => localhost }
}

Routing metrics :
StatsD
❖ Is now a protocol implemented
in all languages
❖ InfluxDB plugin
❖ Collectd can behave as a statsD
daemon (plugin)
❖ Very easy to push metrics
echo "foo:1|c" | nc -u -w0 127.0.0.1 8125

Storing metrics : InfluxDB
❖ Make it behave like Graphite
❖ graphite-api
❖ carbon-relay-ng
❖ graphite-influxdb
❖ Cluster, cluster, cluster
❖ Design for events and metrics

Storing messages : Elasticsearch
❖ Index database
❖ Cluster, cluster, cluster
❖ Full text search

Visualizing @ network level : Packetbeat
❖ Kibana 3 modified version
❖ Dashboards ready out
of the box

Visualizing metrics : Grafana
❖ Compatible
❖ Graphite
❖ InfluxDB
❖ OpenTSDB
❖ Built on Kibana 3

Visualizing messages : Kibana 4
❖ Easy install
❖ Interactive dashboards
❖ Multiple indices

What's missing ? Wishes
❖ Alerting
❖ External monitoring
❖ Repository for dashboards…
❖ Giving sense to metrics and
messages

Alerting reboot
❖ Alert only on end user problems from an end
user perspective
❖ IRC, Chat channel…
❖ Alert thresholds based on history vs static
thresholds
❖ Statistics functions
❖ Boolean conditions
❖ Dynamic thresholds
❖ Anomaly detection
❖ Standard deviation

Coming from Nagios
❖ Graphios will inject perfdatas in Graphite or InfluxDB
❖ Check_graphite can query Graphite API from Nagios for alert based on
history
❖ Logstash will send events to NSCA
❖ Nagios log in Kibana with Grok %{NAGIOSLINE}
❖ Keep Nagios for states ?

Questions ?
@olivjan
ojan@monitoring-fr.org

Time to say goodbye to your Nagios based setup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Time to say goodbye to your Nagios based setup

Similar to Time to say goodbye to your Nagios based setup (20)

Recently uploaded

Recently uploaded (20)

Time to say goodbye to your Nagios based setup

Editor's Notes