SlideShare a Scribd company logo
1 of 125
Download to read offline
Leveling Up Monitoring:
A Decade of Automating and
Scaling Nagios
Katherine Daniels and Laurie Denness
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
Katherine Daniels

@beerops
Senior Operations Engineer, Etsy
Co-Author of Effective DevOps
Laurie Denness
@lozzd
Staff Operations Engineer, Etsy
Official Graph Enthusiast
3
Agenda
@beerops - @lozzd Velocity 2016
Automation
2
Deployinator
3
Scaling + Tooling
4
In The Beginning...
1
25M
Active Buyers
About Etsy
1.6M
Active Sellers
$2.39B
2015 Annual GMS
(As of March 31, 2016)
Monitoring!
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
bit.ly/yaynagios
https://kartar.net/2015/08/monitoring-
survey-2015---tools/
@beerops - @lozzd Velocity 2016
In The Beginning
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
Sometimes your statement needs emphasis with
a black background.
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
LESSONS LEARNED:
Templates are awesome.
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
define service {
use generic-service
hostgroups Linux_hosts,!email-only-servers
service_description SSH
check_command check_ssh
}
@beerops - @lozzd Velocity 2016
define service {
use disk-space-service
hostgroup_name email-only-servers
contact_groups ops_nonurgent
}
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
LESSONS LEARNED:
Start small.
@beerops - @lozzd Velocity 2016
Nagios and Chef
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
24
@beerops - @lozzd Velocity 2016
LESSONS LEARNED:
Automation is awesome!
@beerops - @lozzd Velocity 2016
LESSONS LEARNED:
Automation is awesome!
HA HA JUST KIDDING
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
LESSONS LEARNED:
Trust but verify.
@beerops - @lozzd Velocity 2016
How Many Repos?
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
LESSONS LEARNED:
?!?!?!?!??!?!
@beerops - @lozzd Velocity 2016
LESSONS LEARNED:
Try, fail, learn, and try again.
Problems
Problems
• Four git repos, inconsistent mess, duplication
Problems
• Four git repos, inconsistent mess, duplication
• Broken semi-useful automation - need to regain trust
Problems
• Four git repos, inconsistent mess, duplication
• Broken semi-useful automation - need to regain trust
• Some shared config, some unique
Problems
• Four git repos, inconsistent mess, duplication
• Broken semi-useful automation - need to regain trust
• Some shared config, some unique
• Gain confidence in changes
Problems
• Four git repos, inconsistent mess, duplication
• Broken semi-useful automation - need to regain trust
• Some shared config, some unique
• Gain confidence in changes
• Stop editing on the production box
@beerops - @lozzd Velocity 2016
Nagios and Chef
@beerops - @lozzd Velocity 2016
Nagios and Chef
and Deployinator!
@beerops - @lozzd Velocity 2016
Solution 1: 

Merge everything: find and remove duplication,
shared configs
@beerops - @lozzd Velocity 2016
Thanks Murphy!
@beerops - @lozzd Velocity 2016
Super Secret Option!!!
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
Solution 2:
Using Jenkins CI to test changes before
production
@beerops - @lozzd Velocity 2016
Solution 3:
Use Deployinator to run Chef recipe to generate
automated configs
Chart Title
Chart Title
@beerops - @lozzd Velocity 2016
Solution 4:
Use Deployinator to rsync config to all boxes
• git pull repo on deploy host
• git pull repo on deploy host
• Run Chef recipe to add automated pieces
• git pull repo on deploy host
• Run Chef recipe to add automated pieces
• Re-run the try-nagios script against that
• git pull repo on deploy host
• Run Chef recipe to add automated pieces
• Re-run the try-nagios script against that
• rsync copy from deploy box to Nagios hosts
• git pull repo on deploy host
• Run Chef recipe to add automated pieces
• Re-run the try-nagios script against that
• rsync copy from deploy box to Nagios hosts
• Create symlink for nagios.cfg
• git pull repo on deploy host
• Run Chef recipe to add automated pieces
• Re-run the try-nagios script against that
• rsync copy from deploy box to Nagios hosts
• Create symlink for nagios.cfg
• Restart Nagios
@beerops - @lozzd Velocity 2016
LESSONS LEARNED:
Use the tools you have.
@beerops - @lozzd Velocity 2016
Scaling things up!
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
Core Workers
@beerops - @lozzd Velocity 2016
Core Workers
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
LESSONS LEARNED:
If at first you don’t succeed,
rub some webscale on it.
@beerops - @lozzd Velocity 2016
Iterating and Iterating
@beerops - @lozzd Velocity 2016
LESSONS LEARNED:
Iterate
Iterate
Iterate
@beerops - @lozzd Velocity 2016
To Infinity and Beyond
@beerops - @lozzd Velocity 2016
http://github.com/etsy/opsweekly
http://github.com/etsy/opsweekly
Chart Title
Chart Title
Final Lessons Learned
• Templates are awesome
• Start small
• Automation is awesome
• Trust but verify
• Learn from (y)our mistakes
• Iterate on the tools you have
Open Source Summary
Open Source Summary
• http://github.com/etsy/deployinator
• http://github.com/etsy/pushbot
• http://github.com/etsy/trylib
• http://github.com/etsy/opsweekly
• http://github.com/etsy/nagios-herald
• http://github.com/RJ/irccat
THANK YOU!
@beerops - @lozzd Velocity 2016

More Related Content

Viewers also liked

Rock Stars, Builders, and Janitors: You're Doing It Wrong, New Relic [FutureS...
Rock Stars, Builders, and Janitors: You're Doing It Wrong, New Relic [FutureS...Rock Stars, Builders, and Janitors: You're Doing It Wrong, New Relic [FutureS...
Rock Stars, Builders, and Janitors: You're Doing It Wrong, New Relic [FutureS...New Relic
 
You Can't Buy Agile
You Can't Buy AgileYou Can't Buy Agile
You Can't Buy AgileRTigger
 
Spring and Web Content Management
Spring and Web Content ManagementSpring and Web Content Management
Spring and Web Content ManagementZak Greant
 
InformationWeek covers InfoAxon Technologies for Nagios Implementation
InformationWeek covers InfoAxon Technologies for Nagios Implementation InformationWeek covers InfoAxon Technologies for Nagios Implementation
InformationWeek covers InfoAxon Technologies for Nagios Implementation InfoAxon Technologies Limited
 
Spring first in Magnolia CMS - Spring I/O 2015
Spring first in Magnolia CMS - Spring I/O 2015Spring first in Magnolia CMS - Spring I/O 2015
Spring first in Magnolia CMS - Spring I/O 2015Tobias Mattsson
 
AnsibleFest 2014 - Role Tips and Tricks
AnsibleFest 2014 - Role Tips and TricksAnsibleFest 2014 - Role Tips and Tricks
AnsibleFest 2014 - Role Tips and Tricksjimi-c
 
Nagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and MaintenanceNagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and MaintenanceRazak Mohammed Ali
 
Developing Good Operations Tools
Developing Good Operations ToolsDeveloping Good Operations Tools
Developing Good Operations ToolsJames Turnbull
 
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With NagiosNagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With NagiosNagios
 
Nagios Conference 2011 - Nicholas Scott - Nagios Performance Tuning
Nagios Conference 2011 - Nicholas Scott - Nagios Performance TuningNagios Conference 2011 - Nicholas Scott - Nagios Performance Tuning
Nagios Conference 2011 - Nicholas Scott - Nagios Performance TuningNagios
 
Состояние сетевой безопасности в 2016 году
Состояние сетевой безопасности в 2016 году Состояние сетевой безопасности в 2016 году
Состояние сетевой безопасности в 2016 году Qrator Labs
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionNagios
 
What is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios CoreWhat is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios CoreSanjay Willie
 

Viewers also liked (15)

Rock Stars, Builders, and Janitors: You're Doing It Wrong, New Relic [FutureS...
Rock Stars, Builders, and Janitors: You're Doing It Wrong, New Relic [FutureS...Rock Stars, Builders, and Janitors: You're Doing It Wrong, New Relic [FutureS...
Rock Stars, Builders, and Janitors: You're Doing It Wrong, New Relic [FutureS...
 
You Can't Buy Agile
You Can't Buy AgileYou Can't Buy Agile
You Can't Buy Agile
 
Spring and Web Content Management
Spring and Web Content ManagementSpring and Web Content Management
Spring and Web Content Management
 
InformationWeek covers InfoAxon Technologies for Nagios Implementation
InformationWeek covers InfoAxon Technologies for Nagios Implementation InformationWeek covers InfoAxon Technologies for Nagios Implementation
InformationWeek covers InfoAxon Technologies for Nagios Implementation
 
Spring first in Magnolia CMS - Spring I/O 2015
Spring first in Magnolia CMS - Spring I/O 2015Spring first in Magnolia CMS - Spring I/O 2015
Spring first in Magnolia CMS - Spring I/O 2015
 
AnsibleFest 2014 - Role Tips and Tricks
AnsibleFest 2014 - Role Tips and TricksAnsibleFest 2014 - Role Tips and Tricks
AnsibleFest 2014 - Role Tips and Tricks
 
Nagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and MaintenanceNagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and Maintenance
 
Developing Good Operations Tools
Developing Good Operations ToolsDeveloping Good Operations Tools
Developing Good Operations Tools
 
Rencontres Mondiales Du Logiciel Libre 2009
Rencontres Mondiales Du Logiciel Libre 2009Rencontres Mondiales Du Logiciel Libre 2009
Rencontres Mondiales Du Logiciel Libre 2009
 
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With NagiosNagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
 
Nagios Conference 2011 - Nicholas Scott - Nagios Performance Tuning
Nagios Conference 2011 - Nicholas Scott - Nagios Performance TuningNagios Conference 2011 - Nicholas Scott - Nagios Performance Tuning
Nagios Conference 2011 - Nicholas Scott - Nagios Performance Tuning
 
Состояние сетевой безопасности в 2016 году
Состояние сетевой безопасности в 2016 году Состояние сетевой безопасности в 2016 году
Состояние сетевой безопасности в 2016 году
 
Fully Automated Nagios Jm2L 2009
Fully Automated Nagios Jm2L 2009Fully Automated Nagios Jm2L 2009
Fully Automated Nagios Jm2L 2009
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
 
What is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios CoreWhat is Nagios XI and how is it different from Nagios Core
What is Nagios XI and how is it different from Nagios Core
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Recently uploaded (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Leveling up monitoring: A decade of automating and scaling Nagios