SlideShare a Scribd company logo
1 of 29
Helping operations top-heavy teams
the smart way
Jeff Weiner
Chief Executive Officer
Michael Kehoe
Staff Site Reliability Engineer
Todd Palino
Sr Staff Site Reliability Engineer
This Is The Only Slide You May Need a Picture Of
slideshare.net/ToddPalino slideshare.net/MichaelKehoe3
Michael Kehoe
$ WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Former Network Engineer at the University
of Queensland
Todd Palino
$ WHOAMI
• Senior Staff SRE @ LinkedIn
• Capacity Engineering Team
• Co-Author of Kafka: The Definitive Guide
• Late of VeriSign Infrastructure Engineering
When Operations Isn’t Perfect
Code Yellow
https://devops.com/code-yellow-when-operations-isnt-perfect/
• How to quickly erase all your
technical debt
• How to change your engineering
culture
This talk is not
• How to identify team anti-patterns
• How to work through high toil
• How to create sustainable workloads
This talk is
Today’s
agenda
1 Background
2 Scenario 1: Traffic-SRE
3 Scenario 2: Kafka-SRE
4 Building A Formula For Success
5 Key Learnings
6 Q&A
Background
Personal Experience in the past 19 months
ASSISTANCE RENDERED
• Traffic-SRE: Technical Debt/ Resource
Allocation
• Voyager-SRE: Technical Debt
• Capacity War-room
• Espresso-SRE: Reliability
• Kafka-SRE: Capacity and Alert Fatigue
Scenario 1: Traffic-SRE
Problem Statement
Technical Debt
• Written documentation needed
improvement
• Deployment infrastructure needed
investment
• Alert Fatigue
Traffic-SRE
Problem Statement
Resource Allocations
• Backlog of work for clients
• Staff shortage
Scenario 2: Kafka
Problem Statement
Capacity Planning
• Multi-tenant Infrastructure
• No resource controls
• Unclear resource ownership
• Ad-hoc capacity planning
• Sudden 100% increase in traffic
Problem Statement
Alert Fatigue
• Multiple applications overutilized
• No time for proactive work
• Most alerts non-actionable
Building a formula for
success
Code Yellow
Building a formula for success
Define the areas that
need attacking
Problem Statement
Communicate
expectations with
clients & partners
Communication &
Partnerships
Define success
criteria
Exit Criteria
Get the help that you
require
Resource Acquisition
Plan for short-term &
long-term
Planning
Define the areas that need attacking
Problem Statement
• Admit there is a problem
• Measure the problem
• Understand the problem
• Determines underlying causes that
need to be fixed
Building a formula for success
Define success criteria
Exit Criteria
• Define concrete goals
• Define concrete success criteria
• Measure via an operational metric
• Measure via a project being
completed
• Define timelines for completion
Building a formula for success
Get the help you require
Resource Acquisition
• Ask other teams for help
• Get dedicated engineers/ project
managers/ other roles as required
• Set exit-date for resources
Building a formula for success
Plan for the short-term & long-term
Planning
• Plan out short-term work
• Plan out longer-term projects
• Do they need to be rescheduled?
• Prioritize work that will reduce toil &
burnout (Automation + Measurement)
Building a formula for success
Communicate expectations with clients
& partners
Communication
& Partnerships
• Communicate problem statement &
exit criteria
• Send regular progress updates
• Ensure that stakeholders understand
delays & expected outcomes
Building a formula for success
Key Learnings
Key Learnings
Measure toil/ overhead
Measure
Prioritize efforts to
remove overhead/toil
Prioritize
Communicate with
partners & teams
Communicate
Q&A
Helping operations top-heavy teams the smart way

More Related Content

What's hot

DevOps By The Numbers
DevOps By The NumbersDevOps By The Numbers
DevOps By The NumbersXebiaLabs
 
State of continuous delivery in 2015 - Minsk 15-5-2015
State of continuous delivery in 2015 - Minsk 15-5-2015State of continuous delivery in 2015 - Minsk 15-5-2015
State of continuous delivery in 2015 - Minsk 15-5-2015Pavel Chunyayev
 
Anton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQBAnton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQBIevgenii Katsan
 
Dare to Explore: Discover ET!
Dare to Explore: Discover ET!Dare to Explore: Discover ET!
Dare to Explore: Discover ET!Raj Indugula
 
Optimize Portfolio Performance with Simple Agile Techniques and Jira - Part 1...
Optimize Portfolio Performance with Simple Agile Techniques and Jira - Part 1...Optimize Portfolio Performance with Simple Agile Techniques and Jira - Part 1...
Optimize Portfolio Performance with Simple Agile Techniques and Jira - Part 1...Cprime
 
LeanKit Webinar: Managing Complex Workflows
LeanKit Webinar: Managing Complex WorkflowsLeanKit Webinar: Managing Complex Workflows
LeanKit Webinar: Managing Complex Workflowshscrume
 
Agile Fundamentals
Agile FundamentalsAgile Fundamentals
Agile FundamentalsGraham Dick
 
Agile Project Development
Agile Project DevelopmentAgile Project Development
Agile Project DevelopmentHajrah Jahan
 
Scaling on Atlassian: Avoiding The Top 5 Pitfalls When Migrating From a Legac...
Scaling on Atlassian: Avoiding The Top 5 Pitfalls When Migrating From a Legac...Scaling on Atlassian: Avoiding The Top 5 Pitfalls When Migrating From a Legac...
Scaling on Atlassian: Avoiding The Top 5 Pitfalls When Migrating From a Legac...Cprime
 
Introducing Agile to the Enterprise
Introducing Agile to the EnterpriseIntroducing Agile to the Enterprise
Introducing Agile to the EnterpriseGibraltar Software
 
Oana Feidi - Debugging - Root cause analysis - CodeCamp-10-may-2014
Oana Feidi - Debugging - Root cause analysis - CodeCamp-10-may-2014Oana Feidi - Debugging - Root cause analysis - CodeCamp-10-may-2014
Oana Feidi - Debugging - Root cause analysis - CodeCamp-10-may-2014Codecamp Romania
 
Agile and waterfall
Agile and waterfallAgile and waterfall
Agile and waterfallJohn Morse
 
All You Want To About Kanban Before Doing Kanban Certification | AgileFever
All You Want To About Kanban Before Doing Kanban Certification | AgileFeverAll You Want To About Kanban Before Doing Kanban Certification | AgileFever
All You Want To About Kanban Before Doing Kanban Certification | AgileFeverAgileFever
 
Understanding the Relationship between Lean, Agile, and DevOps: Jon's Slides
Understanding the Relationship between Lean, Agile, and DevOps: Jon's SlidesUnderstanding the Relationship between Lean, Agile, and DevOps: Jon's Slides
Understanding the Relationship between Lean, Agile, and DevOps: Jon's SlidesLeanKit
 
Software Advice UserView: Agile Project Management Report 2015
Software Advice UserView: Agile Project Management Report 2015Software Advice UserView: Agile Project Management Report 2015
Software Advice UserView: Agile Project Management Report 2015Software Advice
 

What's hot (20)

BoS2015 Trish Khoo – Engineering Manager, Google
BoS2015 Trish Khoo – Engineering Manager, GoogleBoS2015 Trish Khoo – Engineering Manager, Google
BoS2015 Trish Khoo – Engineering Manager, Google
 
Agile strategy
Agile strategyAgile strategy
Agile strategy
 
DevOps By The Numbers
DevOps By The NumbersDevOps By The Numbers
DevOps By The Numbers
 
State of continuous delivery in 2015 - Minsk 15-5-2015
State of continuous delivery in 2015 - Minsk 15-5-2015State of continuous delivery in 2015 - Minsk 15-5-2015
State of continuous delivery in 2015 - Minsk 15-5-2015
 
Pm training day 4
Pm training   day 4Pm training   day 4
Pm training day 4
 
Anton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQBAnton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQB
 
Dare to Explore: Discover ET!
Dare to Explore: Discover ET!Dare to Explore: Discover ET!
Dare to Explore: Discover ET!
 
Optimize Portfolio Performance with Simple Agile Techniques and Jira - Part 1...
Optimize Portfolio Performance with Simple Agile Techniques and Jira - Part 1...Optimize Portfolio Performance with Simple Agile Techniques and Jira - Part 1...
Optimize Portfolio Performance with Simple Agile Techniques and Jira - Part 1...
 
LeanKit Webinar: Managing Complex Workflows
LeanKit Webinar: Managing Complex WorkflowsLeanKit Webinar: Managing Complex Workflows
LeanKit Webinar: Managing Complex Workflows
 
Agile Fundamentals
Agile FundamentalsAgile Fundamentals
Agile Fundamentals
 
Agile Project Development
Agile Project DevelopmentAgile Project Development
Agile Project Development
 
Scaling on Atlassian: Avoiding The Top 5 Pitfalls When Migrating From a Legac...
Scaling on Atlassian: Avoiding The Top 5 Pitfalls When Migrating From a Legac...Scaling on Atlassian: Avoiding The Top 5 Pitfalls When Migrating From a Legac...
Scaling on Atlassian: Avoiding The Top 5 Pitfalls When Migrating From a Legac...
 
Implement Agile Practices That Work
Implement Agile Practices That WorkImplement Agile Practices That Work
Implement Agile Practices That Work
 
Introducing Agile to the Enterprise
Introducing Agile to the EnterpriseIntroducing Agile to the Enterprise
Introducing Agile to the Enterprise
 
Oana Feidi - Debugging - Root cause analysis - CodeCamp-10-may-2014
Oana Feidi - Debugging - Root cause analysis - CodeCamp-10-may-2014Oana Feidi - Debugging - Root cause analysis - CodeCamp-10-may-2014
Oana Feidi - Debugging - Root cause analysis - CodeCamp-10-may-2014
 
Agile and waterfall
Agile and waterfallAgile and waterfall
Agile and waterfall
 
All You Want To About Kanban Before Doing Kanban Certification | AgileFever
All You Want To About Kanban Before Doing Kanban Certification | AgileFeverAll You Want To About Kanban Before Doing Kanban Certification | AgileFever
All You Want To About Kanban Before Doing Kanban Certification | AgileFever
 
Understanding the Relationship between Lean, Agile, and DevOps: Jon's Slides
Understanding the Relationship between Lean, Agile, and DevOps: Jon's SlidesUnderstanding the Relationship between Lean, Agile, and DevOps: Jon's Slides
Understanding the Relationship between Lean, Agile, and DevOps: Jon's Slides
 
Project Estimation Tool
Project Estimation Tool Project Estimation Tool
Project Estimation Tool
 
Software Advice UserView: Agile Project Management Report 2015
Software Advice UserView: Agile Project Management Report 2015Software Advice UserView: Agile Project Management Report 2015
Software Advice UserView: Agile Project Management Report 2015
 

Similar to Helping operations top-heavy teams the smart way

Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...Business of Software Conference
 
Applying both of waterfall and iterative development
Applying both of waterfall and iterative developmentApplying both of waterfall and iterative development
Applying both of waterfall and iterative developmentDeny Prasetia
 
Success recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail EarlySuccess recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail EarlyJoseph Vargheese PMP CSM CSP
 
The Dashlane Agile Journey
The Dashlane Agile JourneyThe Dashlane Agile Journey
The Dashlane Agile JourneyDashlane
 
INAAU Project Management for Telecommunications Professionals
INAAU Project Management for Telecommunications ProfessionalsINAAU Project Management for Telecommunications Professionals
INAAU Project Management for Telecommunications ProfessionalsRory McKenna
 
Doing It On Your Own: When to Call in the Consultants, When to Leave Them Out
Doing It On Your Own: When to Call in the Consultants, When to Leave Them OutDoing It On Your Own: When to Call in the Consultants, When to Leave Them Out
Doing It On Your Own: When to Call in the Consultants, When to Leave Them OutNTEN
 
American Electric Power Ercot kickoff
American Electric Power Ercot kickoffAmerican Electric Power Ercot kickoff
American Electric Power Ercot kickoffJohn Napier
 
XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...
XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...
XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...Publicis Sapient Engineering
 
AVATA Webinar: Solutions to Common Demantra & ASCP Challenges
AVATA Webinar: Solutions to Common Demantra & ASCP ChallengesAVATA Webinar: Solutions to Common Demantra & ASCP Challenges
AVATA Webinar: Solutions to Common Demantra & ASCP ChallengesAVATA
 
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...ssuser835d1a
 
Kristian Fischer - Put Test in the Driver's Seat
Kristian Fischer - Put Test in the Driver's SeatKristian Fischer - Put Test in the Driver's Seat
Kristian Fischer - Put Test in the Driver's SeatTEST Huddle
 
Scrum Agile by David Mann
 Scrum Agile by David Mann Scrum Agile by David Mann
Scrum Agile by David MannJames Sutter
 
Process improvement scrum_agile_v2_by_david_mann
Process improvement scrum_agile_v2_by_david_mannProcess improvement scrum_agile_v2_by_david_mann
Process improvement scrum_agile_v2_by_david_mannJim Sutter
 
Engineering Teams and Systems for Velocity
Engineering Teams and Systems for VelocityEngineering Teams and Systems for Velocity
Engineering Teams and Systems for VelocityJean Barmash
 
Fundamentals of agile tntu (2015-04-27)
Fundamentals of agile   tntu (2015-04-27)Fundamentals of agile   tntu (2015-04-27)
Fundamentals of agile tntu (2015-04-27)Oleg Nazarevych
 
Tackling the Fallacy of Agile
Tackling the Fallacy of Agile Tackling the Fallacy of Agile
Tackling the Fallacy of Agile BSGAfrica
 
Changing culture and building efficiencies by applying the Lean principles to...
Changing culture and building efficiencies by applying the Lean principles to...Changing culture and building efficiencies by applying the Lean principles to...
Changing culture and building efficiencies by applying the Lean principles to...Association for Project Management
 

Similar to Helping operations top-heavy teams the smart way (20)

Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
 
Applying both of waterfall and iterative development
Applying both of waterfall and iterative developmentApplying both of waterfall and iterative development
Applying both of waterfall and iterative development
 
Success recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail EarlySuccess recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail Early
 
The Dashlane Agile Journey
The Dashlane Agile JourneyThe Dashlane Agile Journey
The Dashlane Agile Journey
 
INAAU Project Management for Telecommunications Professionals
INAAU Project Management for Telecommunications ProfessionalsINAAU Project Management for Telecommunications Professionals
INAAU Project Management for Telecommunications Professionals
 
Doing It On Your Own: When to Call in the Consultants, When to Leave Them Out
Doing It On Your Own: When to Call in the Consultants, When to Leave Them OutDoing It On Your Own: When to Call in the Consultants, When to Leave Them Out
Doing It On Your Own: When to Call in the Consultants, When to Leave Them Out
 
American Electric Power Ercot kickoff
American Electric Power Ercot kickoffAmerican Electric Power Ercot kickoff
American Electric Power Ercot kickoff
 
Pm training day 3
Pm training   day 3Pm training   day 3
Pm training day 3
 
XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...
XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...
XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...
 
AVATA Webinar: Solutions to Common Demantra & ASCP Challenges
AVATA Webinar: Solutions to Common Demantra & ASCP ChallengesAVATA Webinar: Solutions to Common Demantra & ASCP Challenges
AVATA Webinar: Solutions to Common Demantra & ASCP Challenges
 
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
 
Kristian Fischer - Put Test in the Driver's Seat
Kristian Fischer - Put Test in the Driver's SeatKristian Fischer - Put Test in the Driver's Seat
Kristian Fischer - Put Test in the Driver's Seat
 
Scrum Agile by David Mann
 Scrum Agile by David Mann Scrum Agile by David Mann
Scrum Agile by David Mann
 
Process improvement scrum_agile_v2_by_david_mann
Process improvement scrum_agile_v2_by_david_mannProcess improvement scrum_agile_v2_by_david_mann
Process improvement scrum_agile_v2_by_david_mann
 
Engineering Teams and Systems for Velocity
Engineering Teams and Systems for VelocityEngineering Teams and Systems for Velocity
Engineering Teams and Systems for Velocity
 
Fundamentals of agile tntu (2015-04-27)
Fundamentals of agile   tntu (2015-04-27)Fundamentals of agile   tntu (2015-04-27)
Fundamentals of agile tntu (2015-04-27)
 
Fundamentals of Project Management
Fundamentals of Project ManagementFundamentals of Project Management
Fundamentals of Project Management
 
Tackling the Fallacy of Agile
Tackling the Fallacy of Agile Tackling the Fallacy of Agile
Tackling the Fallacy of Agile
 
Changing culture and building efficiencies by applying the Lean principles to...
Changing culture and building efficiencies by applying the Lean principles to...Changing culture and building efficiencies by applying the Lean principles to...
Changing culture and building efficiencies by applying the Lean principles to...
 

More from Michael Kehoe

QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFMichael Kehoe
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...Michael Kehoe
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsMichael Kehoe
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInMichael Kehoe
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...Michael Kehoe
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInMichael Kehoe
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsMichael Kehoe
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentMichael Kehoe
 

More from Michael Kehoe (20)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 

Recently uploaded

Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 

Recently uploaded (20)

Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 

Helping operations top-heavy teams the smart way

  • 1. Helping operations top-heavy teams the smart way Jeff Weiner Chief Executive Officer Michael Kehoe Staff Site Reliability Engineer Todd Palino Sr Staff Site Reliability Engineer
  • 2. This Is The Only Slide You May Need a Picture Of slideshare.net/ToddPalino slideshare.net/MichaelKehoe3
  • 3. Michael Kehoe $ WHOAMI • Staff Site Reliability Engineer @ LinkedIn • Production-SRE Team • Funny accent = Australian + 4 years American • Former Network Engineer at the University of Queensland
  • 4. Todd Palino $ WHOAMI • Senior Staff SRE @ LinkedIn • Capacity Engineering Team • Co-Author of Kafka: The Definitive Guide • Late of VeriSign Infrastructure Engineering
  • 5. When Operations Isn’t Perfect Code Yellow https://devops.com/code-yellow-when-operations-isnt-perfect/
  • 6. • How to quickly erase all your technical debt • How to change your engineering culture This talk is not
  • 7. • How to identify team anti-patterns • How to work through high toil • How to create sustainable workloads This talk is
  • 8. Today’s agenda 1 Background 2 Scenario 1: Traffic-SRE 3 Scenario 2: Kafka-SRE 4 Building A Formula For Success 5 Key Learnings 6 Q&A
  • 10. Personal Experience in the past 19 months ASSISTANCE RENDERED • Traffic-SRE: Technical Debt/ Resource Allocation • Voyager-SRE: Technical Debt • Capacity War-room • Espresso-SRE: Reliability • Kafka-SRE: Capacity and Alert Fatigue
  • 12. Problem Statement Technical Debt • Written documentation needed improvement • Deployment infrastructure needed investment • Alert Fatigue Traffic-SRE
  • 13. Problem Statement Resource Allocations • Backlog of work for clients • Staff shortage
  • 15.
  • 16. Problem Statement Capacity Planning • Multi-tenant Infrastructure • No resource controls • Unclear resource ownership • Ad-hoc capacity planning • Sudden 100% increase in traffic
  • 17. Problem Statement Alert Fatigue • Multiple applications overutilized • No time for proactive work • Most alerts non-actionable
  • 18. Building a formula for success
  • 20. Building a formula for success Define the areas that need attacking Problem Statement Communicate expectations with clients & partners Communication & Partnerships Define success criteria Exit Criteria Get the help that you require Resource Acquisition Plan for short-term & long-term Planning
  • 21. Define the areas that need attacking Problem Statement • Admit there is a problem • Measure the problem • Understand the problem • Determines underlying causes that need to be fixed Building a formula for success
  • 22. Define success criteria Exit Criteria • Define concrete goals • Define concrete success criteria • Measure via an operational metric • Measure via a project being completed • Define timelines for completion Building a formula for success
  • 23. Get the help you require Resource Acquisition • Ask other teams for help • Get dedicated engineers/ project managers/ other roles as required • Set exit-date for resources Building a formula for success
  • 24. Plan for the short-term & long-term Planning • Plan out short-term work • Plan out longer-term projects • Do they need to be rescheduled? • Prioritize work that will reduce toil & burnout (Automation + Measurement) Building a formula for success
  • 25. Communicate expectations with clients & partners Communication & Partnerships • Communicate problem statement & exit criteria • Send regular progress updates • Ensure that stakeholders understand delays & expected outcomes Building a formula for success
  • 27. Key Learnings Measure toil/ overhead Measure Prioritize efforts to remove overhead/toil Prioritize Communicate with partners & teams Communicate
  • 28. Q&A