SlideShare a Scribd company logo
1 of 29
Helping operations top-heavy
teams the smart way
Jeff Weiner
Chief Executive Officer
Michael Kehoe
Staff Site Reliability Engineer
Todd Palino
Sr Staff Site Reliability Engineer
This Is The Only Slide You May Need a Picture Of
slideshare.net/ToddPalino slideshare.net/MichaelKehoe3
Michael Kehoe
$ WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Former Network Engineer at the
University of Queensland
Todd Palino
$ WHOAMI
• Senior Staff SRE @ LinkedIn
• Capacity Engineering Team
• Co-Author of Kafka: The Definitive Guide
• Late of VeriSign Infrastructure
Engineering
When Operations Isn’t Perfect
Code Yellow
https://devops.com/code-yellow-when-operations-isnt-perfect/
• How to quickly erase all your
technical debt
• How to change your engineering
culture
This talk is not
• How to identify team anti-patterns
• How to work through high toil
• How to create sustainable
workloads
This talk is
Today’s
agenda
1 Background
2 Scenario 1: Traffic-SRE
3 Scenario 2: Kafka-SRE
4 Building A Formula For Success
5 Key Learnings
6 Q&A
Background
Personal Experience in the past 19 months
ASSISTANCE RENDERED
• Traffic-SRE: Technical Debt/ Resource
Allocation
• Voyager-SRE: Technical Debt
• Capacity War-room
• Espresso-SRE: Reliability
• Kafka-SRE: Capacity and Alert Fatigue
Scenario 1: Traffic-SRE
Problem Statement
Technical Debt
• Written documentation needed
improvement
• Deployment infrastructure needed
investment
• Alert Fatigue
Traffic-SRE
Problem Statement
Resource Allocations
• Backlog of work for clients
• Staff shortage
Scenario 2: Kafka
Problem Statement
Capacity Planning
• Multi-tenant Infrastructure
• No resource controls
• Unclear resource ownership
• Ad-hoc capacity planning
• Sudden 100% increase in traffic
Problem Statement
Alert Fatigue
• Multiple applications overutilized
• No time for proactive work
• Most alerts non-actionable
Building a formula for
success
Code Yellow
Building a formula for success
Define the areas
that need attacking
Problem Statement
Communicate
expectations with
clients & partners
Communication &
Partnerships
Define success
criteria
Exit Criteria
Get the help that
you require
Resource
Acquisition
Plan for short-term
& long-term
Planning
Define the areas that need attacking
Problem Statement
• Admit there is a problem
• Measure the problem
• Understand the problem
• Determines underlying causes that
need to be fixed
Building a formula for success
Define success criteria
Exit Criteria
• Define concrete goals
• Define concrete success criteria
• Measure via an operational metric
• Measure via a project being
completed
• Define timelines for completion
Building a formula for success
Get the help you require
Resource Acquisition
• Ask other teams for help
• Get dedicated engineers/ project
managers/ other roles as required
• Set exit-date for resources
Building a formula for success
Plan for the short-term & long-term
Planning
• Plan out short-term work
• Plan out longer-term projects
• Do they need to be rescheduled?
• Prioritize work that will reduce toil &
burnout (Automation +
Measurement)
Building a formula for success
Communicate expectations with
clients & partners
Communicatio
n &
Partnerships
• Communicate problem statement &
exit criteria
• Send regular progress updates
• Ensure that stakeholders
understand delays & expected
outcomes
Building a formula for success
Key Learnings
Key Learnings
Measure toil/
overhead
Measure
Prioritize efforts to
remove overhead/toil
Prioritize
Communicate with
partners & teams
Communicate
Q&A
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way

More Related Content

What's hot

Agile performance testing
Agile performance testingAgile performance testing
Agile performance testing
Cesario Ramos
 
Agile_SDLC_Node.js@Paypal_ppt
Agile_SDLC_Node.js@Paypal_pptAgile_SDLC_Node.js@Paypal_ppt
Agile_SDLC_Node.js@Paypal_ppt
Hitesh Kumar
 

What's hot (20)

The Continuous delivery value - Funaro
The Continuous delivery value - FunaroThe Continuous delivery value - Funaro
The Continuous delivery value - Funaro
 
Technical Capabilities as enabler for Agile and DevOps
Technical Capabilities as enabler for Agile and DevOpsTechnical Capabilities as enabler for Agile and DevOps
Technical Capabilities as enabler for Agile and DevOps
 
So long scrum, hello kanban
So long scrum, hello kanbanSo long scrum, hello kanban
So long scrum, hello kanban
 
Working Effectively with PeopleSoft Support
Working Effectively with PeopleSoft SupportWorking Effectively with PeopleSoft Support
Working Effectively with PeopleSoft Support
 
Agile performance testing
Agile performance testingAgile performance testing
Agile performance testing
 
Extreme Makeover OnBase Edition
Extreme Makeover OnBase EditionExtreme Makeover OnBase Edition
Extreme Makeover OnBase Edition
 
Implementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should KnowImplementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should Know
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
DevOps for Database webinar
DevOps for Database webinarDevOps for Database webinar
DevOps for Database webinar
 
So you-want-to-go-faster
So you-want-to-go-fasterSo you-want-to-go-faster
So you-want-to-go-faster
 
Django production
Django productionDjango production
Django production
 
Api360 Summit The Automated Monolith
Api360 Summit  The Automated MonolithApi360 Summit  The Automated Monolith
Api360 Summit The Automated Monolith
 
Outsmarting Merge Edge Cases in Component Based Design
Outsmarting Merge Edge Cases in Component Based DesignOutsmarting Merge Edge Cases in Component Based Design
Outsmarting Merge Edge Cases in Component Based Design
 
Kudo codefest : Delivering High Quality Software Through Better Release Process
Kudo codefest : Delivering High Quality Software Through Better Release ProcessKudo codefest : Delivering High Quality Software Through Better Release Process
Kudo codefest : Delivering High Quality Software Through Better Release Process
 
DR in the Cloud: Finding the Right Tool for the Job
DR in the Cloud: Finding the Right Tool for the JobDR in the Cloud: Finding the Right Tool for the Job
DR in the Cloud: Finding the Right Tool for the Job
 
Agile_SDLC_Node.js@Paypal_ppt
Agile_SDLC_Node.js@Paypal_pptAgile_SDLC_Node.js@Paypal_ppt
Agile_SDLC_Node.js@Paypal_ppt
 
NYC MeetUp 10.9
NYC MeetUp 10.9NYC MeetUp 10.9
NYC MeetUp 10.9
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
5 Cloud Migration Experiences Not to Be Repeated
5 Cloud Migration Experiences Not to Be Repeated5 Cloud Migration Experiences Not to Be Repeated
5 Cloud Migration Experiences Not to Be Repeated
 
Kanban VS Scrum
Kanban VS ScrumKanban VS Scrum
Kanban VS Scrum
 

Similar to Code Yellow: Helping Operations Top-Heavy Teams the Smart Way

Success recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail EarlySuccess recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail Early
Joseph Vargheese PMP CSM CSP
 
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
ssuser835d1a
 
Doing It On Your Own: When to Call in the Consultants, When to Leave Them Out
Doing It On Your Own: When to Call in the Consultants, When to Leave Them OutDoing It On Your Own: When to Call in the Consultants, When to Leave Them Out
Doing It On Your Own: When to Call in the Consultants, When to Leave Them Out
NTEN
 
Scrum Agile by David Mann
 Scrum Agile by David Mann Scrum Agile by David Mann
Scrum Agile by David Mann
James Sutter
 
Process improvement scrum_agile_v2_by_david_mann
Process improvement scrum_agile_v2_by_david_mannProcess improvement scrum_agile_v2_by_david_mann
Process improvement scrum_agile_v2_by_david_mann
Jim Sutter
 

Similar to Code Yellow: Helping Operations Top-Heavy Teams the Smart Way (20)

Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
 
Success recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail EarlySuccess recipe for new IT projects-Agile way. Fail Fast, Fail Early
Success recipe for new IT projects-Agile way. Fail Fast, Fail Early
 
Applying both of waterfall and iterative development
Applying both of waterfall and iterative developmentApplying both of waterfall and iterative development
Applying both of waterfall and iterative development
 
INAAU Project Management for Telecommunications Professionals
INAAU Project Management for Telecommunications ProfessionalsINAAU Project Management for Telecommunications Professionals
INAAU Project Management for Telecommunications Professionals
 
American Electric Power Ercot kickoff
American Electric Power Ercot kickoffAmerican Electric Power Ercot kickoff
American Electric Power Ercot kickoff
 
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
103240-The-New-Way-of-Thinking-Our-Implementation-experience-with-Oracle-HCM-...
 
The Dashlane Agile Journey
The Dashlane Agile JourneyThe Dashlane Agile Journey
The Dashlane Agile Journey
 
Engineering Teams and Systems for Velocity
Engineering Teams and Systems for VelocityEngineering Teams and Systems for Velocity
Engineering Teams and Systems for Velocity
 
CRMready Webinar Series - Part 3 - How to Make Your Nonprofit’s CRM Implement...
CRMready Webinar Series - Part 3 - How to Make Your Nonprofit’s CRM Implement...CRMready Webinar Series - Part 3 - How to Make Your Nonprofit’s CRM Implement...
CRMready Webinar Series - Part 3 - How to Make Your Nonprofit’s CRM Implement...
 
AVATA Webinar: Solutions to Common Demantra & ASCP Challenges
AVATA Webinar: Solutions to Common Demantra & ASCP ChallengesAVATA Webinar: Solutions to Common Demantra & ASCP Challenges
AVATA Webinar: Solutions to Common Demantra & ASCP Challenges
 
Doing It On Your Own: When to Call in the Consultants, When to Leave Them Out
Doing It On Your Own: When to Call in the Consultants, When to Leave Them OutDoing It On Your Own: When to Call in the Consultants, When to Leave Them Out
Doing It On Your Own: When to Call in the Consultants, When to Leave Them Out
 
XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...
XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...
XebiCon'17 : //Tam-tams// Voici l’histoire de la disparition des dinosaures d...
 
Kristian Fischer - Put Test in the Driver's Seat
Kristian Fischer - Put Test in the Driver's SeatKristian Fischer - Put Test in the Driver's Seat
Kristian Fischer - Put Test in the Driver's Seat
 
Pm training day 3
Pm training   day 3Pm training   day 3
Pm training day 3
 
Scrum Agile by David Mann
 Scrum Agile by David Mann Scrum Agile by David Mann
Scrum Agile by David Mann
 
Process improvement scrum_agile_v2_by_david_mann
Process improvement scrum_agile_v2_by_david_mannProcess improvement scrum_agile_v2_by_david_mann
Process improvement scrum_agile_v2_by_david_mann
 
Fundamentals of agile tntu (2015-04-27)
Fundamentals of agile   tntu (2015-04-27)Fundamentals of agile   tntu (2015-04-27)
Fundamentals of agile tntu (2015-04-27)
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 

More from Todd Palino

Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Todd Palino
 
I'm No Hero: Full Stack Reliability at LinkedIn
I'm No Hero: Full Stack Reliability at LinkedInI'm No Hero: Full Stack Reliability at LinkedIn
I'm No Hero: Full Stack Reliability at LinkedIn
Todd Palino
 

More from Todd Palino (13)

Leading Without Managing: Becoming an SRE Technical Leader
Leading Without Managing: Becoming an SRE Technical LeaderLeading Without Managing: Becoming an SRE Technical Leader
Leading Without Managing: Becoming an SRE Technical Leader
 
From Operations to Site Reliability in Five Easy Steps
From Operations to Site Reliability in Five Easy StepsFrom Operations to Site Reliability in Five Easy Steps
From Operations to Site Reliability in Five Easy Steps
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to Know
 
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
 
Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum Pain
 
I'm No Hero: Full Stack Reliability at LinkedIn
I'm No Hero: Full Stack Reliability at LinkedInI'm No Hero: Full Stack Reliability at LinkedIn
I'm No Hero: Full Stack Reliability at LinkedIn
 
Multi tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaMulti tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafka
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
More Datacenters, More Problems
More Datacenters, More ProblemsMore Datacenters, More Problems
More Datacenters, More Problems
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and Profit
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
 
Enterprise Kafka: Kafka as a Service
Enterprise Kafka: Kafka as a ServiceEnterprise Kafka: Kafka as a Service
Enterprise Kafka: Kafka as a Service
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 

Recently uploaded (20)

UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 

Code Yellow: Helping Operations Top-Heavy Teams the Smart Way

  • 1. Helping operations top-heavy teams the smart way Jeff Weiner Chief Executive Officer Michael Kehoe Staff Site Reliability Engineer Todd Palino Sr Staff Site Reliability Engineer
  • 2. This Is The Only Slide You May Need a Picture Of slideshare.net/ToddPalino slideshare.net/MichaelKehoe3
  • 3. Michael Kehoe $ WHOAMI • Staff Site Reliability Engineer @ LinkedIn • Production-SRE Team • Funny accent = Australian + 4 years American • Former Network Engineer at the University of Queensland
  • 4. Todd Palino $ WHOAMI • Senior Staff SRE @ LinkedIn • Capacity Engineering Team • Co-Author of Kafka: The Definitive Guide • Late of VeriSign Infrastructure Engineering
  • 5. When Operations Isn’t Perfect Code Yellow https://devops.com/code-yellow-when-operations-isnt-perfect/
  • 6. • How to quickly erase all your technical debt • How to change your engineering culture This talk is not
  • 7. • How to identify team anti-patterns • How to work through high toil • How to create sustainable workloads This talk is
  • 8. Today’s agenda 1 Background 2 Scenario 1: Traffic-SRE 3 Scenario 2: Kafka-SRE 4 Building A Formula For Success 5 Key Learnings 6 Q&A
  • 10. Personal Experience in the past 19 months ASSISTANCE RENDERED • Traffic-SRE: Technical Debt/ Resource Allocation • Voyager-SRE: Technical Debt • Capacity War-room • Espresso-SRE: Reliability • Kafka-SRE: Capacity and Alert Fatigue
  • 12. Problem Statement Technical Debt • Written documentation needed improvement • Deployment infrastructure needed investment • Alert Fatigue Traffic-SRE
  • 13. Problem Statement Resource Allocations • Backlog of work for clients • Staff shortage
  • 15.
  • 16. Problem Statement Capacity Planning • Multi-tenant Infrastructure • No resource controls • Unclear resource ownership • Ad-hoc capacity planning • Sudden 100% increase in traffic
  • 17. Problem Statement Alert Fatigue • Multiple applications overutilized • No time for proactive work • Most alerts non-actionable
  • 18. Building a formula for success
  • 20. Building a formula for success Define the areas that need attacking Problem Statement Communicate expectations with clients & partners Communication & Partnerships Define success criteria Exit Criteria Get the help that you require Resource Acquisition Plan for short-term & long-term Planning
  • 21. Define the areas that need attacking Problem Statement • Admit there is a problem • Measure the problem • Understand the problem • Determines underlying causes that need to be fixed Building a formula for success
  • 22. Define success criteria Exit Criteria • Define concrete goals • Define concrete success criteria • Measure via an operational metric • Measure via a project being completed • Define timelines for completion Building a formula for success
  • 23. Get the help you require Resource Acquisition • Ask other teams for help • Get dedicated engineers/ project managers/ other roles as required • Set exit-date for resources Building a formula for success
  • 24. Plan for the short-term & long-term Planning • Plan out short-term work • Plan out longer-term projects • Do they need to be rescheduled? • Prioritize work that will reduce toil & burnout (Automation + Measurement) Building a formula for success
  • 25. Communicate expectations with clients & partners Communicatio n & Partnerships • Communicate problem statement & exit criteria • Send regular progress updates • Ensure that stakeholders understand delays & expected outcomes Building a formula for success
  • 27. Key Learnings Measure toil/ overhead Measure Prioritize efforts to remove overhead/toil Prioritize Communicate with partners & teams Communicate
  • 28. Q&A