SlideShare a Scribd company logo
1 of 29
Michael Kehoe
Senior Site Reliability Engineer
LinkedIn
Using SaltStack to Auto Triage and
Remediate Production Systems
Topics
• $ whoami
• Salt @ LinkedIn
• LinkedIn’s auto remediation story
• Nurse
• Salt API
• Auto-remediation Salt Modules
• How to get started
$ whoami
Salt @ LinkedIn
• 11-14k minions per master
• Up to 8.5 events/ second per master
• Deployment system heavily utilizes Salt
• 5 dedicated SRE’s work on LinkedIn’s Salt infrastructure
• Number of SRE’s who also contribute
Alert Growth vs NOC Engineers
LinkedIn’s Auto-remediation Story
0
5
10
15
20
25
30
0
5000
10000
15000
20000
25000
30000
35000
40000
2010 2011 2012 2013 2014 2015 2016 2017
Num of Alerts
Num of Engineers
The problem
LinkedIn’s Auto-remediation Story
• Growing number of high priority alerts vs Engineers to watch
• Complicated ITR’s (runbooks) took time for NOC engineers to
execute and escalate
• SRE’s would forget to run diagnostic tooling during outages
• Longer MTTR == Bad experience for members
Building the Solution
LinkedIn’s Auto-Remediation Story
• LinkedIn needed a workflow engine
• Requirements
• Take automated actions against applications/ hosts
• Perform the run book automatically
• Appropriate auditing
• Scalable
What’s already out there
Autoremediation
• StackStorm
• fbar
• SaltStack
Started building an in-house solution in Mid-2014
Nurse
Image: freeflaticons.com
Nurse
• ‘Event based’ system
• Built on top of LinkedIn’s existing monitoring infrastructure
• Allows for manual execution via Web UI
• Allows for other systems to plug-in via REST API
• Built in rules-engine
• Allows for complicated checks/ branching
• Global environment awareness
• Scalable!
Nurse
Nurse
The basics
Salt REST API
• RESTful interface to SALT
• Allows for external-authentication
• PAM
• LDAP
• Your own plugin…
• Built on top of CherryPy
Interface
Salt REST API
• /login - Log in to receive a session token
• /logout - Remove or invalidate sessions
• /minions – Working directly with minions
• /jobs - Getting lists of previously run jobs or getting the return from a single job
• /run - Run commands without normal session handling
• /events - Expose the Salt event bus
• /hook - A generic web hook entry point that fires an event on Salt's event bus
• /keys – Wrapper around key management
• /ws - Open a WebSocket connection to Salt's event bus
• /stats - Return a dump of statistics collected from the CherryPy server
https://docs.saltstack.com/en/latest/ref/netapi/all/salt.netapi.rest_cherrypy.html
Configuration
Salt REST API
external_auth:
ldap:
headless-nurse:
- '*':
- test.*
- nurse.*
- '@jobs'
https://docs.saltstack.com/en/latest/ref/netapi/all/salt.netapi.rest_cherrypy.html
/etc/salt/master.d/salt-api.conf
Configuration
Salt REST API
rest_cherrypy:
port: 8888
ssl_crt: /etc/salt/pki/api/salt-api.crt
ssl_key: /etc/salt/pki/api/salt-api.key
https://docs.saltstack.com/en/latest/ref/netapi/all/salt.netapi.rest_cherrypy.html
/etc/salt/master.d/salt-api.conf
Configuration
Salt REST API
auth.ldap.basedn: 'dc=example,dc=com'
auth.ldap.binddn: ’readonly-ldap@example'
auth.ldap.bindpw: ’password'
auth.ldap.filter: sAMAccountName={{ username }}
auth.ldap.server: ldap.example.com
auth.ldap.tls: True
auth.ldap.persontype: 'person'
auth.ldap.groupclass: 'group'
auth.ldap.groupou: 'Users’
https://docs.saltstack.com/en/latest/topics/eauth/index.html
/etc/salt/master.d/salt-api.conf
Code Interfaces
Salt REST API
• There is a Python interface to the Salt API – pepper
• Provides basic wrapper around REST API
• https://github.com/saltstack/pepper/
• See https://github.com/SUSE/salt-netapi-client for JAVA
bindings
Python Code example
Salt REST API
from pepper.libpepper import Pepper
api = Pepper('https://salt.example.com:8888')
api.login('saltdev', 'saltdev', ’ldap')
# Run simple function
api.low([{'client': 'local', 'tgt': '*', 'fun': 'test.ping'}])
# Execute a runner function
api.runner('jobs.lookup_jid', jid=12345)
Goals
Auto-remediation Salt Modules
• Auto-triage issues
• Reduce context-switching
• Run labor intensive tasks automatically
• Gather data while engineer is logging in after escalation
• Auto-remediate issues
• Let the engineer sleep
• Faster MTTR
Implementation
Auto-remediation Salt Modules
• Auto-triage issues
• Identify abusive clients
• Collect & analyses thread/ heap dumps
• Parse & summarize log files
Implementation
Auto-remediation Salt Modules
• Auto-remediate issues
• Restart/ OOR applications
• Scale-up applications
• Blocking abusive clients
• Update A/ B experiment definitions
• Otherwise escalate…
Implementation
Auto-remediation Salt Modules
• Plenty of Salt modules/ runners already out there
• Over 300 modules are available in Salt core
• Modules to take remediate & notify
• Write your own
• Make sure you test!
Success
LinkedIn’s Auto-remediation Story
• 854k actions taken
• 100% of service health check alerts are on boarded
• ~37k man hours have now been automated
• Now automating ~1100 hours/ week
Success
LinkedIn’s Auto-remediation Story
0
5
10
15
20
25
30
0
5000
10000
15000
20000
25000
30000
35000
40000
2010 2011 2012 2013 2014 2015 2016 2017
Num of Alerts
Num of Engineers
Some lessons learnt
How to get started
• Need to make decision on how you architecture YOUR ‘event
bus’
• Use external monitoring to trigger Salt via API
• Use reactors/ beacons internally within Salt’s event bus
• See ‘Thorium’ documentation for Salt’s new reactor engine
• Auditing is important!
• Need to know what/ who triggered actions
Some lessons learnt
How to get started
• Reporting is important!
• Need to know how often automated actions are being taken
• Find failure hotspots
• Leverage event-bus or returners
• Safety first
• Don’t give yourself too much rope…
Conclusion
• Think carefully about your ‘Event-Driven-Automation’
architecture
• The sky is the limit with Salt…don’t limit yourself
• Again…safety first
Questions?
29
Thank You

More Related Content

What's hot

Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeRoland Kuhn
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know confluent
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logAlexander Dean
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environmentconfluent
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaAlexander Dean
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...HostedbyConfluent
 
Chef Analytics Webinar
Chef Analytics WebinarChef Analytics Webinar
Chef Analytics WebinarJames Casey
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
 
[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7Site24x7
 
10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian PerformanceAtlassian
 
JIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter StricklandJIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter StricklandAtlassian
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EnginePraveen Thirukonda
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applicationsconfluent
 
Chef Analytics (Chef NYC Meeting - July 2014)
Chef Analytics (Chef NYC Meeting - July 2014)Chef Analytics (Chef NYC Meeting - July 2014)
Chef Analytics (Chef NYC Meeting - July 2014)James Casey
 
Agile infrastructure
Agile infrastructureAgile infrastructure
Agile infrastructureTarun Rajput
 
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...HostedbyConfluent
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)KafkaZone
 
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Tyler Nguyen
 
Redgate Database Devops Demo webinar - Visual Studio Team Services - 21st Fe...
Redgate Database Devops Demo webinar  - Visual Studio Team Services - 21st Fe...Redgate Database Devops Demo webinar  - Visual Studio Team Services - 21st Fe...
Redgate Database Devops Demo webinar - Visual Studio Team Services - 21st Fe...KateDuggan2
 

What's hot (20)

Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in Practice
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified log
 
Intro to.net core 20170111
Intro to.net core   20170111Intro to.net core   20170111
Intro to.net core 20170111
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
 
Chef Analytics Webinar
Chef Analytics WebinarChef Analytics Webinar
Chef Analytics Webinar
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
 
[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7
 
10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance
 
JIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter StricklandJIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter Strickland
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation Engine
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
Chef Analytics (Chef NYC Meeting - July 2014)
Chef Analytics (Chef NYC Meeting - July 2014)Chef Analytics (Chef NYC Meeting - July 2014)
Chef Analytics (Chef NYC Meeting - July 2014)
 
Agile infrastructure
Agile infrastructureAgile infrastructure
Agile infrastructure
 
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
Becoming Protocol-Agnostic with Kafka, REST, GraphQL & gRPC | Tyler Mills, Sm...
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)
 
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
 
Redgate Database Devops Demo webinar - Visual Studio Team Services - 21st Fe...
Redgate Database Devops Demo webinar  - Visual Studio Team Services - 21st Fe...Redgate Database Devops Demo webinar  - Visual Studio Team Services - 21st Fe...
Redgate Database Devops Demo webinar - Visual Studio Team Services - 21st Fe...
 

Viewers also liked

Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Michael Kehoe
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentMichael Kehoe
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 
Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialPooja Tangi
 
Event driven-automation and workflows
Event driven-automation and workflowsEvent driven-automation and workflows
Event driven-automation and workflowsDmitri Zimine
 
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Issa Fattah
 
SaltConf 2015: Salt stack at web scale: Better, Stronger, Faster
SaltConf 2015: Salt stack at web scale: Better, Stronger, FasterSaltConf 2015: Salt stack at web scale: Better, Stronger, Faster
SaltConf 2015: Salt stack at web scale: Better, Stronger, FasterThomas Jackson
 
Real-time Infrastructure Management with SaltStack - OpenWest 2013
Real-time Infrastructure Management with SaltStack - OpenWest 2013Real-time Infrastructure Management with SaltStack - OpenWest 2013
Real-time Infrastructure Management with SaltStack - OpenWest 2013SaltStack
 
Creating SaltStack State data with Pyobjects
Creating SaltStack State data with PyobjectsCreating SaltStack State data with Pyobjects
Creating SaltStack State data with PyobjectsEvan Borgstrom
 
PuppetConf 2016: Keynote: Pulling the Strings to Containerize Your Life - Sco...
PuppetConf 2016: Keynote: Pulling the Strings to Containerize Your Life - Sco...PuppetConf 2016: Keynote: Pulling the Strings to Containerize Your Life - Sco...
PuppetConf 2016: Keynote: Pulling the Strings to Containerize Your Life - Sco...Puppet
 
Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016
Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016
Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016DevOpsDays Tel Aviv
 
ContentCal AutoPilot
ContentCal AutoPilotContentCal AutoPilot
ContentCal AutoPilotAndy Lambert
 
Intelligent infrastructure with SaltStack
Intelligent infrastructure with SaltStackIntelligent infrastructure with SaltStack
Intelligent infrastructure with SaltStackLove Nyberg
 
OpenStack Journey in Tieto Elastic Cloud
OpenStack Journey in Tieto Elastic CloudOpenStack Journey in Tieto Elastic Cloud
OpenStack Journey in Tieto Elastic CloudJakub Pavlik
 
Software reliability tools and common software errors
Software reliability tools and common software errorsSoftware reliability tools and common software errors
Software reliability tools and common software errorsHimanshu
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksJakub Pavlik
 
Automate your development environment with Jira and Saltstack
Automate your development environment with Jira and SaltstackAutomate your development environment with Jira and Saltstack
Automate your development environment with Jira and SaltstackNetworkedAssets
 
DevOps on AWS: Deep Dive on Infrastructure as Code - Toronto
DevOps on AWS: Deep Dive on Infrastructure as Code - TorontoDevOps on AWS: Deep Dive on Infrastructure as Code - Toronto
DevOps on AWS: Deep Dive on Infrastructure as Code - TorontoAmazon Web Services
 
SaltStack Configuration Management
SaltStack Configuration ManagementSaltStack Configuration Management
SaltStack Configuration ManagementNathan Sickler
 
On the Importance of Infrastructure as Code
On the Importance of Infrastructure as CodeOn the Importance of Infrastructure as Code
On the Importance of Infrastructure as CodeKris Buytaert
 

Viewers also liked (20)

Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potential
 
Event driven-automation and workflows
Event driven-automation and workflowsEvent driven-automation and workflows
Event driven-automation and workflows
 
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
 
SaltConf 2015: Salt stack at web scale: Better, Stronger, Faster
SaltConf 2015: Salt stack at web scale: Better, Stronger, FasterSaltConf 2015: Salt stack at web scale: Better, Stronger, Faster
SaltConf 2015: Salt stack at web scale: Better, Stronger, Faster
 
Real-time Infrastructure Management with SaltStack - OpenWest 2013
Real-time Infrastructure Management with SaltStack - OpenWest 2013Real-time Infrastructure Management with SaltStack - OpenWest 2013
Real-time Infrastructure Management with SaltStack - OpenWest 2013
 
Creating SaltStack State data with Pyobjects
Creating SaltStack State data with PyobjectsCreating SaltStack State data with Pyobjects
Creating SaltStack State data with Pyobjects
 
PuppetConf 2016: Keynote: Pulling the Strings to Containerize Your Life - Sco...
PuppetConf 2016: Keynote: Pulling the Strings to Containerize Your Life - Sco...PuppetConf 2016: Keynote: Pulling the Strings to Containerize Your Life - Sco...
PuppetConf 2016: Keynote: Pulling the Strings to Containerize Your Life - Sco...
 
Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016
Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016
Event-driven Infrastructure - Mike Place, SaltStack - DevOpsDays Tel Aviv 2016
 
ContentCal AutoPilot
ContentCal AutoPilotContentCal AutoPilot
ContentCal AutoPilot
 
Intelligent infrastructure with SaltStack
Intelligent infrastructure with SaltStackIntelligent infrastructure with SaltStack
Intelligent infrastructure with SaltStack
 
OpenStack Journey in Tieto Elastic Cloud
OpenStack Journey in Tieto Elastic CloudOpenStack Journey in Tieto Elastic Cloud
OpenStack Journey in Tieto Elastic Cloud
 
Software reliability tools and common software errors
Software reliability tools and common software errorsSoftware reliability tools and common software errors
Software reliability tools and common software errors
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 Networks
 
Automate your development environment with Jira and Saltstack
Automate your development environment with Jira and SaltstackAutomate your development environment with Jira and Saltstack
Automate your development environment with Jira and Saltstack
 
DevOps on AWS: Deep Dive on Infrastructure as Code - Toronto
DevOps on AWS: Deep Dive on Infrastructure as Code - TorontoDevOps on AWS: Deep Dive on Infrastructure as Code - Toronto
DevOps on AWS: Deep Dive on Infrastructure as Code - Toronto
 
SaltStack Configuration Management
SaltStack Configuration ManagementSaltStack Configuration Management
SaltStack Configuration Management
 
On the Importance of Infrastructure as Code
On the Importance of Infrastructure as CodeOn the Importance of Infrastructure as Code
On the Importance of Infrastructure as Code
 

Similar to Using SaltStack to Auto Triage and Remediate Production Systems

Openfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentOpenfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentGeorgi Kodinov
 
CIRCUIT 2015 - Monitoring AEM
CIRCUIT 2015 - Monitoring AEMCIRCUIT 2015 - Monitoring AEM
CIRCUIT 2015 - Monitoring AEMICF CIRCUIT
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...rschuppe
 
Continuous Delivery: How RightScale Releases Weekly
Continuous Delivery: How RightScale Releases WeeklyContinuous Delivery: How RightScale Releases Weekly
Continuous Delivery: How RightScale Releases WeeklyRightScale
 
Getting started with Office 365 SharePoint 2010 online development
Getting started with Office 365 SharePoint 2010 online developmentGetting started with Office 365 SharePoint 2010 online development
Getting started with Office 365 SharePoint 2010 online developmentJeremy Thake
 
Server and application monitoring webinars [Applications Manager]: Part 1
Server and application monitoring webinars [Applications Manager]: Part 1Server and application monitoring webinars [Applications Manager]: Part 1
Server and application monitoring webinars [Applications Manager]: Part 1ManageEngine, Zoho Corporation
 
Azure Functions Real World Examples
Azure Functions Real World Examples Azure Functions Real World Examples
Azure Functions Real World Examples Yochay Kiriaty
 
SharePoint 2013 – the upgrade story
SharePoint 2013 – the upgrade storySharePoint 2013 – the upgrade story
SharePoint 2013 – the upgrade storySPC Adriatics
 
Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2ManageEngine, Zoho Corporation
 
淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2Wen-Tien Chang
 
Agile startup company management and operation
Agile startup company management and operationAgile startup company management and operation
Agile startup company management and operationJiang Zhu
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltStack
 
Single Page Applications: Your Browser is the OS!
Single Page Applications: Your Browser is the OS!Single Page Applications: Your Browser is the OS!
Single Page Applications: Your Browser is the OS!Jeremy Likness
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithMarkus Eisele
 
Automating the cip compliance test lab
Automating the cip compliance test labAutomating the cip compliance test lab
Automating the cip compliance test labChuck Reynolds
 
SAP portal: breaking and forensicating
SAP portal: breaking and forensicating SAP portal: breaking and forensicating
SAP portal: breaking and forensicating ERPScan
 
Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Catalogic Software
 

Similar to Using SaltStack to Auto Triage and Remediate Production Systems (20)

Openfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentOpenfest15 MySQL Plugin Development
Openfest15 MySQL Plugin Development
 
CIRCUIT 2015 - Monitoring AEM
CIRCUIT 2015 - Monitoring AEMCIRCUIT 2015 - Monitoring AEM
CIRCUIT 2015 - Monitoring AEM
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Continuous Delivery: How RightScale Releases Weekly
Continuous Delivery: How RightScale Releases WeeklyContinuous Delivery: How RightScale Releases Weekly
Continuous Delivery: How RightScale Releases Weekly
 
Redundant devops
Redundant devopsRedundant devops
Redundant devops
 
Getting started with Office 365 SharePoint 2010 online development
Getting started with Office 365 SharePoint 2010 online developmentGetting started with Office 365 SharePoint 2010 online development
Getting started with Office 365 SharePoint 2010 online development
 
Server and application monitoring webinars [Applications Manager]: Part 1
Server and application monitoring webinars [Applications Manager]: Part 1Server and application monitoring webinars [Applications Manager]: Part 1
Server and application monitoring webinars [Applications Manager]: Part 1
 
Azure Functions Real World Examples
Azure Functions Real World Examples Azure Functions Real World Examples
Azure Functions Real World Examples
 
SharePoint 2013 – the upgrade story
SharePoint 2013 – the upgrade storySharePoint 2013 – the upgrade story
SharePoint 2013 – the upgrade story
 
Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2
 
淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2
 
Agile startup company management and operation
Agile startup company management and operationAgile startup company management and operation
Agile startup company management and operation
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
 
Single Page Applications: Your Browser is the OS!
Single Page Applications: Your Browser is the OS!Single Page Applications: Your Browser is the OS!
Single Page Applications: Your Browser is the OS!
 
Scale net apps in aws
Scale net apps in awsScale net apps in aws
Scale net apps in aws
 
Scale net apps in aws
Scale net apps in awsScale net apps in aws
Scale net apps in aws
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
Automating the cip compliance test lab
Automating the cip compliance test labAutomating the cip compliance test lab
Automating the cip compliance test lab
 
SAP portal: breaking and forensicating
SAP portal: breaking and forensicating SAP portal: breaking and forensicating
SAP portal: breaking and forensicating
 
Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems
 

More from Michael Kehoe

Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFMichael Kehoe
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...Michael Kehoe
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsMichael Kehoe
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 

More from Michael Kehoe (17)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 

Recently uploaded

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 

Using SaltStack to Auto Triage and Remediate Production Systems

  • 1. Michael Kehoe Senior Site Reliability Engineer LinkedIn Using SaltStack to Auto Triage and Remediate Production Systems
  • 2. Topics • $ whoami • Salt @ LinkedIn • LinkedIn’s auto remediation story • Nurse • Salt API • Auto-remediation Salt Modules • How to get started
  • 4. Salt @ LinkedIn • 11-14k minions per master • Up to 8.5 events/ second per master • Deployment system heavily utilizes Salt • 5 dedicated SRE’s work on LinkedIn’s Salt infrastructure • Number of SRE’s who also contribute
  • 5. Alert Growth vs NOC Engineers LinkedIn’s Auto-remediation Story 0 5 10 15 20 25 30 0 5000 10000 15000 20000 25000 30000 35000 40000 2010 2011 2012 2013 2014 2015 2016 2017 Num of Alerts Num of Engineers
  • 6. The problem LinkedIn’s Auto-remediation Story • Growing number of high priority alerts vs Engineers to watch • Complicated ITR’s (runbooks) took time for NOC engineers to execute and escalate • SRE’s would forget to run diagnostic tooling during outages • Longer MTTR == Bad experience for members
  • 7. Building the Solution LinkedIn’s Auto-Remediation Story • LinkedIn needed a workflow engine • Requirements • Take automated actions against applications/ hosts • Perform the run book automatically • Appropriate auditing • Scalable
  • 8. What’s already out there Autoremediation • StackStorm • fbar • SaltStack Started building an in-house solution in Mid-2014
  • 10. Nurse • ‘Event based’ system • Built on top of LinkedIn’s existing monitoring infrastructure • Allows for manual execution via Web UI • Allows for other systems to plug-in via REST API • Built in rules-engine • Allows for complicated checks/ branching • Global environment awareness • Scalable!
  • 11. Nurse
  • 12. Nurse
  • 13. The basics Salt REST API • RESTful interface to SALT • Allows for external-authentication • PAM • LDAP • Your own plugin… • Built on top of CherryPy
  • 14. Interface Salt REST API • /login - Log in to receive a session token • /logout - Remove or invalidate sessions • /minions – Working directly with minions • /jobs - Getting lists of previously run jobs or getting the return from a single job • /run - Run commands without normal session handling • /events - Expose the Salt event bus • /hook - A generic web hook entry point that fires an event on Salt's event bus • /keys – Wrapper around key management • /ws - Open a WebSocket connection to Salt's event bus • /stats - Return a dump of statistics collected from the CherryPy server https://docs.saltstack.com/en/latest/ref/netapi/all/salt.netapi.rest_cherrypy.html
  • 15. Configuration Salt REST API external_auth: ldap: headless-nurse: - '*': - test.* - nurse.* - '@jobs' https://docs.saltstack.com/en/latest/ref/netapi/all/salt.netapi.rest_cherrypy.html /etc/salt/master.d/salt-api.conf
  • 16. Configuration Salt REST API rest_cherrypy: port: 8888 ssl_crt: /etc/salt/pki/api/salt-api.crt ssl_key: /etc/salt/pki/api/salt-api.key https://docs.saltstack.com/en/latest/ref/netapi/all/salt.netapi.rest_cherrypy.html /etc/salt/master.d/salt-api.conf
  • 17. Configuration Salt REST API auth.ldap.basedn: 'dc=example,dc=com' auth.ldap.binddn: ’readonly-ldap@example' auth.ldap.bindpw: ’password' auth.ldap.filter: sAMAccountName={{ username }} auth.ldap.server: ldap.example.com auth.ldap.tls: True auth.ldap.persontype: 'person' auth.ldap.groupclass: 'group' auth.ldap.groupou: 'Users’ https://docs.saltstack.com/en/latest/topics/eauth/index.html /etc/salt/master.d/salt-api.conf
  • 18. Code Interfaces Salt REST API • There is a Python interface to the Salt API – pepper • Provides basic wrapper around REST API • https://github.com/saltstack/pepper/ • See https://github.com/SUSE/salt-netapi-client for JAVA bindings
  • 19. Python Code example Salt REST API from pepper.libpepper import Pepper api = Pepper('https://salt.example.com:8888') api.login('saltdev', 'saltdev', ’ldap') # Run simple function api.low([{'client': 'local', 'tgt': '*', 'fun': 'test.ping'}]) # Execute a runner function api.runner('jobs.lookup_jid', jid=12345)
  • 20. Goals Auto-remediation Salt Modules • Auto-triage issues • Reduce context-switching • Run labor intensive tasks automatically • Gather data while engineer is logging in after escalation • Auto-remediate issues • Let the engineer sleep • Faster MTTR
  • 21. Implementation Auto-remediation Salt Modules • Auto-triage issues • Identify abusive clients • Collect & analyses thread/ heap dumps • Parse & summarize log files
  • 22. Implementation Auto-remediation Salt Modules • Auto-remediate issues • Restart/ OOR applications • Scale-up applications • Blocking abusive clients • Update A/ B experiment definitions • Otherwise escalate…
  • 23. Implementation Auto-remediation Salt Modules • Plenty of Salt modules/ runners already out there • Over 300 modules are available in Salt core • Modules to take remediate & notify • Write your own • Make sure you test!
  • 24. Success LinkedIn’s Auto-remediation Story • 854k actions taken • 100% of service health check alerts are on boarded • ~37k man hours have now been automated • Now automating ~1100 hours/ week
  • 26. Some lessons learnt How to get started • Need to make decision on how you architecture YOUR ‘event bus’ • Use external monitoring to trigger Salt via API • Use reactors/ beacons internally within Salt’s event bus • See ‘Thorium’ documentation for Salt’s new reactor engine • Auditing is important! • Need to know what/ who triggered actions
  • 27. Some lessons learnt How to get started • Reporting is important! • Need to know how often automated actions are being taken • Find failure hotspots • Leverage event-bus or returners • Safety first • Don’t give yourself too much rope…
  • 28. Conclusion • Think carefully about your ‘Event-Driven-Automation’ architecture • The sky is the limit with Salt…don’t limit yourself • Again…safety first

Editor's Notes

  1. WhoAmI Background on Salt & Auto-remediation at LinkedIn Nurse – LinkedIn’s Auto-remediation engine Salt API & Autoremediation Salt Modules How to get started
  2. Been at LinkedIn for nearly 2.5 years Apart of LinkedIn’s ARVT group Occasional committer to LinkedIn’s Salt ecosystem
  3. Want to give you some context into our infrastructure Average is 11-14k minions per production datacenter Up to 8.5 events/ second per master Deployment system heavily utilizes Salt 5 dedicated SRE’s work on LinkedIn’s Salt infrastructure Number of SRE’s who also contribute Keep these numbers in mind as the session progresses
  4. Let’s talk about LinkedIn’s autoremediation story Graph depicts the number of alerts our NOC watches vs the Number of NOC engineers we employee globally
  5. What is out there Has anyone heard of these
  6. Login/ logout Jobs Run Events Events – get data from event bus Hook – Fire an event INTO the salt bus WS – Open websocket Stats
  7. So what are the goals of building an auto-remediation engine
  8. Remediate Apache/ Nginx restarts Notify Slack/ Hipchat modules
  9. In 2014, we don’t increase headcount, now we’re lowering our NOC engineer headcount