SlideShare a Scribd company logo
1 of 31
Building Disaster Recovery via
Resilience Engineering
Michael Kehoe
Staff SRE - LinkedIn
Tonight’s
agenda
1 Introductions
2 What is Resilience Engineering
3 The Problem Statement
4 Project Overview
5 Testing Process
6 Project Outcomes
7 Key Takeaways
8 Q&A
Introduction
Michael Kehoe
/USR/BIN/WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Former Network Engineer at the
University of Queensland
Who are we?
PRODUCTION-SRE TEAM AT LINKEDIN
• Disaster Recovery Planning and
Automation
• Incident Response and Automation
• Visibility Engineering
• Reliability Principles
LinkedIn
EVOLUTION OF THE INFRASTRUCTURE
2003 2010 2011 2013 2014 2015
Active &
Passive
Active &
Active
Multi-colo 3-
way Active &
Active
Multi-colo n-
way Active &
Active
LinkedIn
2018
4 Data Centers 21 PoPs 1000+ services
What is Resilience
Engineering?
What is Resilience Engineering?
• Projects that directly demand increased
resilience from our applications and
infrastructure.
• Application Injection Failure
• Infrastructure Injection Failure
• Full Disaster-Recovery Tests
Problem Statement
How often have you heard stories where someone
thought they had a disaster strategy, never tested it and
it fails when you need it the most?
Problem Statement
• How do we ensure that we always have
disaster recovery ability without incident?
• How do we consistently test for disaster
recovery ability without disrupting the
company?
Project Overview
Project Overview
1
• Build a process (with Automation) to facilitate disaster recovery
• Operate the process on regular cadence
• Provide reporting on outcomes of tests with engineering executives
Testing Process
What is Load Testing?
5x a week Peak hour traffic Fixed SLA
LinkedIn Traffic-Tier
Border
Router IPVS ATS ATS Frontend
EDGE FABRIC
Stickyrouting
LinkedIn Traffic-Tier
Fabric
Buckets
1
91
2 3 10
92 93 100
LinkedIn Traffic-Tier
EDGE FABRIC
DC1
DC2
DC1 in Cookie
Got DC2 as secondary fabric
Gets
secondary
fabric for userStickyrouting
TrafficShift Architecture
Web
application
Salt master
Stickyrouting
ServiceCouchbase Backend Worker
Processes
FABRIC
BUCKETS
Load Testing
FABRIC
DC3
DC1 DC2
60%
Traffic
Percentage
Load Testing
22
Project Outcomes
Benefits of Load-testing
Capacity
Planning
Identify Bugs Confidence
Benefits of Load-testing
CAPACITY PLANNING
• Through this process, we continuously validate our infrastructure
capacity
• This is the best signal we can possibly get since we’re simulating a
real disaster
Benefits of Load-testing
IDENTIFY BUGS
2
• Some bugs are only found at high load (under duress)
• Helps find inefficiency’s that otherwise may not be found until it’s too late
• Gives us clues on how to make our code more resilient to potential failure
Benefits of Load-testing
CONFIDENCE
2
• Through load-testing, we’ve built confidence in our disaster recovery
strategy
• We understand exactly:
• What process to follow
• How long it takes to avert disaster
• What are the risks associated with a disaster incident
Key Takeaways
Key Takeaways
• Resilience Engineering is a must for
LinkedIn
• Design infrastructure to facilitate disaster
recovery
• Disaster-test regularly to avoid surprises
• Automate your testing/ process to reduce
engagement time
Q&A
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering

More Related Content

What's hot

DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValueDevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValueRapidValue
 
How to plug the data gap in DevOps
How to plug the data gap in DevOpsHow to plug the data gap in DevOps
How to plug the data gap in DevOpsDeborah Schalm
 
ATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldAgile Testing Alliance
 
DevOps the Big Picture for Testers by Joseph Ours
DevOps the Big Picture for Testers by Joseph OursDevOps the Big Picture for Testers by Joseph Ours
DevOps the Big Picture for Testers by Joseph OursQA or the Highway
 
Continuous Testing in DevOps
Continuous Testing in DevOpsContinuous Testing in DevOps
Continuous Testing in DevOpsTechWell
 
Scaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBeesScaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBeesDevOps.com
 
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBHOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBDevOpsDays Tel Aviv
 
DevOps and All the Continuouses w/ Helen Beal
DevOps and All the Continuouses w/ Helen BealDevOps and All the Continuouses w/ Helen Beal
DevOps and All the Continuouses w/ Helen BealSonatype
 
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Hannes Lenke
 
The Art of Container Monitoring
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container MonitoringDerek Chen
 
Drive Continuous Delivery With Continuous Testing
Drive Continuous Delivery With Continuous TestingDrive Continuous Delivery With Continuous Testing
Drive Continuous Delivery With Continuous TestingCA Technologies
 
Secure your Azure and DevOps in a smart way
Secure your Azure and DevOps in a smart waySecure your Azure and DevOps in a smart way
Secure your Azure and DevOps in a smart wayEficode
 
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.ioSoftware Guru
 
Engineering Trust in Your Automated Tests
Engineering Trust in Your Automated TestsEngineering Trust in Your Automated Tests
Engineering Trust in Your Automated TestsJyoti Mittal
 
Where Testers & QA Fit in the Story of DevOps
Where Testers & QA Fit in the Story of DevOpsWhere Testers & QA Fit in the Story of DevOps
Where Testers & QA Fit in the Story of DevOpsQASymphony
 
Designing for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real WorldDesigning for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real WorldQualitest
 
Why Serverless is scary without DevSecOps and Observability
Why Serverless is scary without DevSecOps and ObservabilityWhy Serverless is scary without DevSecOps and Observability
Why Serverless is scary without DevSecOps and ObservabilityEficode
 

What's hot (20)

DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValueDevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
DevOps Continuous Integration & Delivery - A Whitepaper by RapidValue
 
How to plug the data gap in DevOps
How to plug the data gap in DevOpsHow to plug the data gap in DevOps
How to plug the data gap in DevOps
 
ATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real World
 
DevOps the Big Picture for Testers by Joseph Ours
DevOps the Big Picture for Testers by Joseph OursDevOps the Big Picture for Testers by Joseph Ours
DevOps the Big Picture for Testers by Joseph Ours
 
Continuous Testing in DevOps
Continuous Testing in DevOpsContinuous Testing in DevOps
Continuous Testing in DevOps
 
Scaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBeesScaling Enterprise DevOps with CloudBees
Scaling Enterprise DevOps with CloudBees
 
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBHOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
 
Building DevOps Toolchain
Building DevOps ToolchainBuilding DevOps Toolchain
Building DevOps Toolchain
 
DevOps and All the Continuouses w/ Helen Beal
DevOps and All the Continuouses w/ Helen BealDevOps and All the Continuouses w/ Helen Beal
DevOps and All the Continuouses w/ Helen Beal
 
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
Reliability (R)evolution: Turning the DevOps World Upside Down (Again).
 
The Art of Container Monitoring
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container Monitoring
 
Drive Continuous Delivery With Continuous Testing
Drive Continuous Delivery With Continuous TestingDrive Continuous Delivery With Continuous Testing
Drive Continuous Delivery With Continuous Testing
 
A True Story of Why QA Loves DevOps
A True Story of Why QA Loves DevOpsA True Story of Why QA Loves DevOps
A True Story of Why QA Loves DevOps
 
Kku2011
Kku2011Kku2011
Kku2011
 
Secure your Azure and DevOps in a smart way
Secure your Azure and DevOps in a smart waySecure your Azure and DevOps in a smart way
Secure your Azure and DevOps in a smart way
 
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
¿Qué es DevOps y por qué es importante en el Ciclo de Software? por michelada.io
 
Engineering Trust in Your Automated Tests
Engineering Trust in Your Automated TestsEngineering Trust in Your Automated Tests
Engineering Trust in Your Automated Tests
 
Where Testers & QA Fit in the Story of DevOps
Where Testers & QA Fit in the Story of DevOpsWhere Testers & QA Fit in the Story of DevOps
Where Testers & QA Fit in the Story of DevOps
 
Designing for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real WorldDesigning for the internet - Page Objects for the Real World
Designing for the internet - Page Objects for the Real World
 
Why Serverless is scary without DevSecOps and Observability
Why Serverless is scary without DevSecOps and ObservabilityWhy Serverless is scary without DevSecOps and Observability
Why Serverless is scary without DevSecOps and Observability
 

Similar to SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering

The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 
Microdeployments for microservices dev ops nashville
Microdeployments for microservices   dev ops nashvilleMicrodeployments for microservices   dev ops nashville
Microdeployments for microservices dev ops nashvilleNathaniel (Ned) Bauerle
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Mike Villiger
 
DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)Qualitest
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Agile India
 
Continuous Delivery for people who do not write code - Matthew Skelton - Conflux
Continuous Delivery for people who do not write code - Matthew Skelton - ConfluxContinuous Delivery for people who do not write code - Matthew Skelton - Conflux
Continuous Delivery for people who do not write code - Matthew Skelton - ConfluxMatthew Skelton
 
Getting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of ConceptsGetting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of ConceptsThousandEyes
 
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...Simon Storm
 
DevOps in Practice
DevOps in PracticeDevOps in Practice
DevOps in PracticeDerek Chen
 
implanting DevOps at scale using dynamic test environments
implanting DevOps at scale using dynamic test environmentsimplanting DevOps at scale using dynamic test environments
implanting DevOps at scale using dynamic test environmentsQualiQuali
 
Implementing DevOps at Scale Using Dynamic Environments
Implementing DevOps at Scale Using Dynamic EnvironmentsImplementing DevOps at Scale Using Dynamic Environments
Implementing DevOps at Scale Using Dynamic EnvironmentsSauce Labs
 
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017Anders Lundsgård
 
Operating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud MicroservicesOperating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud MicroservicesNoriaki Tatsumi
 
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity Software Ireland
 
How to Build a Metrics-optimized Software Delivery Pipeline
How to Build a Metrics-optimized Software Delivery PipelineHow to Build a Metrics-optimized Software Delivery Pipeline
How to Build a Metrics-optimized Software Delivery PipelineDynatrace
 

Similar to SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering (20)

The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 
Microdeployments for microservices dev ops nashville
Microdeployments for microservices   dev ops nashvilleMicrodeployments for microservices   dev ops nashville
Microdeployments for microservices dev ops nashville
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
 
Forget about Agile
Forget about AgileForget about Agile
Forget about Agile
 
DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)DevSecOps - It can change your life (cycle)
DevSecOps - It can change your life (cycle)
 
Quality 4.0 and reimagining quality
Quality 4.0 and reimagining qualityQuality 4.0 and reimagining quality
Quality 4.0 and reimagining quality
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...
 
Software Lifecycle
Software LifecycleSoftware Lifecycle
Software Lifecycle
 
Continuous Delivery for people who do not write code - Matthew Skelton - Conflux
Continuous Delivery for people who do not write code - Matthew Skelton - ConfluxContinuous Delivery for people who do not write code - Matthew Skelton - Conflux
Continuous Delivery for people who do not write code - Matthew Skelton - Conflux
 
Building Next Gen Applications and Microservices
Building Next Gen Applications and Microservices Building Next Gen Applications and Microservices
Building Next Gen Applications and Microservices
 
Getting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of ConceptsGetting Started with ThousandEyes Proof of Concepts
Getting Started with ThousandEyes Proof of Concepts
 
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
Agile and Continuous Delivery for Audits and Exams - DC Continuous Delivery M...
 
DevOps in Practice
DevOps in PracticeDevOps in Practice
DevOps in Practice
 
implanting DevOps at scale using dynamic test environments
implanting DevOps at scale using dynamic test environmentsimplanting DevOps at scale using dynamic test environments
implanting DevOps at scale using dynamic test environments
 
Implementing DevOps at Scale Using Dynamic Environments
Implementing DevOps at Scale Using Dynamic EnvironmentsImplementing DevOps at Scale Using Dynamic Environments
Implementing DevOps at Scale Using Dynamic Environments
 
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
The DevOps journey in an Enterprise - CoDe-Conf. Stockholm September 14, 2017
 
Operating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud MicroservicesOperating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud Microservices
 
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
 
How to Build a Metrics-optimized Software Delivery Pipeline
How to Build a Metrics-optimized Software Delivery PipelineHow to Build a Metrics-optimized Software Delivery Pipeline
How to Build a Metrics-optimized Software Delivery Pipeline
 

More from Michael Kehoe

Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFMichael Kehoe
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsMichael Kehoe
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInMichael Kehoe
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...Michael Kehoe
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInMichael Kehoe
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsMichael Kehoe
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentMichael Kehoe
 

More from Michael Kehoe (20)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 

Recently uploaded

Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 

Recently uploaded (20)

Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 

SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engineering

Editor's Notes

  1. Anil TrafficShift is a two part application - A web application provides easy way for engineers to create planned and emergency offline plans. We leverage couchbase as our key/value persistence store Python backend worker processes talks to Salt Master via Salt API And instructs stickyrouting service to turn buckets online and offline We leverage this toolset to run load tests or stress tests of our datacenters Uff that’s a lot of talk, how to mitigate issues by doing trafficshift. But if you keenly observe, we are migrating live traffic across datacenter, why not leverage the same to stress test datacenter ? How awesome is that ? Not stress test single service, stress the whole system. I am gonna talk about load testing next.
  2. Anil As you can see by turning precise number of buckets offline in US-West and US-East - we can reroute that extra traffic to Target datacenter We do this in a pretty controlled manner in steps until the threshold level of 50% is reached. If for any reason, an alert fires during this stress test, our TrafficShift tool acknowledges that automatically rebalances the site traffic, sends out the stress test report to SREs