SlideShare a Scribd company logo
1 of 18
Download to read offline
can be applied to the nascent world of microservices.
Put some SRE
in your microservices
Hard-won lessons from the world of SRE…
The many faces of
Theo Schlossnagle
@postwait
CEO Circonus
The nature of the problem
Software Sucks
Once you’ve run software at scale,
you have a deep understanding of
how it is all tied together with
loose string and hope.
All software will fail, but
good software
fails well
• Consider the phrase:
“have you used X in anger.”
Never undervalue grace in failure.
Rule . 𝛌1 Crash landings should be both
fast and controlled.
What it means to
fail quickly & safely
• The scope of failure should
collapse completely.
• The time to failure should be
measured in small multiples of
normal service time
• Nothing outside the scope of
failure should be impacted.
https://www.youtube.com/watch?v=5SL1A2d2e7M
Autopsies: not just for medicine.
Rule . 𝛌2 Post-mortems are
fundamental.
Pragmatic analysis is required to
understand failure’s
true nature
• Post-mortem analysis is critical
• Stack traces
• Forensic logs
• Images (cores, dumps, etc.)
The difference between a shock and electrocution is real.
Rule . 𝛌3 Use circuit breakers.
Circuit breakers are designed to
avoid
cascading failure
• it’s not all about,
especially with microservices
• protect yourselves and others
• circuit breakers of many type
• timing
• queue depth
• concurrency
http://melissaomarkham.com
You cannot understand what you cannot measure.
Rule . 𝛌4 Behavior is complex.
Understand it.
Don’t measure to assess availability
measure to understand
Build robust models of behavior
Understand performance changes
Don’t use averages
Don’t use percentiles alone
Don’t measure to assess availability
measure to understand
Build robust models of behavior
Understand performance changes
Don’t use averages
Don’t use percentiles alone
It’s easy to demand perfection; it’s also stupid.
Rule . 𝛌5 Have an failure budget.
Avoid failure is simply impossible,
expect and manage
failure
• use failure budgets
• set expectations reasonably
• define and reward successes on
improvement and competency,
not just uptime.
Justice should be blind; operations should not.
Rule . 𝛌6 Instrumentation &
Observability have no equals.
For every “I wonder what X is right now?”
in production,
you must have answers
DTrace
eBPF
Instrument code for observability
https://www.pinterest.com/pin/441775044670412234/
Thank you.

More Related Content

What's hot

Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guidematthewbrahms
 
Shift Left. Wait, what? No, Shift Right!!!
Shift Left. Wait, what? No, Shift Right!!!Shift Left. Wait, what? No, Shift Right!!!
Shift Left. Wait, what? No, Shift Right!!!Phillip Maddux
 
The left is not wrong, just not right; It's time to shift right!
The left is not wrong, just not right; It's time to shift right!The left is not wrong, just not right; It's time to shift right!
The left is not wrong, just not right; It's time to shift right!Phillip Maddux
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 
How to SRE when you have no SRE
How to SRE when you have no SREHow to SRE when you have no SRE
How to SRE when you have no SRESquadcast Inc
 
OpsStack Overview 20170806.1
OpsStack Overview 20170806.1OpsStack Overview 20170806.1
OpsStack Overview 20170806.1Siglos
 
Is your Automation Infrastructure ‘Well Architected’?
Is your Automation Infrastructure ‘Well Architected’?Is your Automation Infrastructure ‘Well Architected’?
Is your Automation Infrastructure ‘Well Architected’?Adam Goucher
 
Sigma Open Tech Week: Bitter Truth About Software Security
Sigma Open Tech Week: Bitter Truth About Software SecuritySigma Open Tech Week: Bitter Truth About Software Security
Sigma Open Tech Week: Bitter Truth About Software SecurityVlad Styran
 
Ops Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WayOps Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WaySeniorStoryteller
 
Tales from a radically polyglot team
Tales from a radically polyglot teamTales from a radically polyglot team
Tales from a radically polyglot teamThoughtworks
 
CSA Raleigh application security and deception in the cloud
CSA Raleigh   application security and deception in the cloudCSA Raleigh   application security and deception in the cloud
CSA Raleigh application security and deception in the cloudPhillip Maddux
 
What We Learned from Three Years of Sciencing the Crap Out of DevOps
What We Learned from Three Years of Sciencing the Crap Out of DevOpsWhat We Learned from Three Years of Sciencing the Crap Out of DevOps
What We Learned from Three Years of Sciencing the Crap Out of DevOpsSeniorStoryteller
 
The Most Important Thing: How Mozilla Does Security and What You Can Steal
The Most Important Thing: How Mozilla Does Security and What You Can StealThe Most Important Thing: How Mozilla Does Security and What You Can Steal
The Most Important Thing: How Mozilla Does Security and What You Can Stealmozilla.presentations
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates YouBradford Stephens
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos EngineeringGremlin
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Bradford Stephens
 
Building a culture where software projects get done
Building a culture where software projects get doneBuilding a culture where software projects get done
Building a culture where software projects get donethegdb
 
Blend it up - leancamp london presentation
Blend it up - leancamp london presentationBlend it up - leancamp london presentation
Blend it up - leancamp london presentationAntonio Terreno
 

What's hot (20)

Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guide
 
Shift Left. Wait, what? No, Shift Right!!!
Shift Left. Wait, what? No, Shift Right!!!Shift Left. Wait, what? No, Shift Right!!!
Shift Left. Wait, what? No, Shift Right!!!
 
The left is not wrong, just not right; It's time to shift right!
The left is not wrong, just not right; It's time to shift right!The left is not wrong, just not right; It's time to shift right!
The left is not wrong, just not right; It's time to shift right!
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
How to SRE when you have no SRE
How to SRE when you have no SREHow to SRE when you have no SRE
How to SRE when you have no SRE
 
OpsStack Overview 20170806.1
OpsStack Overview 20170806.1OpsStack Overview 20170806.1
OpsStack Overview 20170806.1
 
Is your Automation Infrastructure ‘Well Architected’?
Is your Automation Infrastructure ‘Well Architected’?Is your Automation Infrastructure ‘Well Architected’?
Is your Automation Infrastructure ‘Well Architected’?
 
Sigma Open Tech Week: Bitter Truth About Software Security
Sigma Open Tech Week: Bitter Truth About Software SecuritySigma Open Tech Week: Bitter Truth About Software Security
Sigma Open Tech Week: Bitter Truth About Software Security
 
Ops Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WayOps Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the Way
 
Introduction to Chaos Engineering
Introduction to Chaos EngineeringIntroduction to Chaos Engineering
Introduction to Chaos Engineering
 
Tales from a radically polyglot team
Tales from a radically polyglot teamTales from a radically polyglot team
Tales from a radically polyglot team
 
CSA Raleigh application security and deception in the cloud
CSA Raleigh   application security and deception in the cloudCSA Raleigh   application security and deception in the cloud
CSA Raleigh application security and deception in the cloud
 
What We Learned from Three Years of Sciencing the Crap Out of DevOps
What We Learned from Three Years of Sciencing the Crap Out of DevOpsWhat We Learned from Three Years of Sciencing the Crap Out of DevOps
What We Learned from Three Years of Sciencing the Crap Out of DevOps
 
The Most Important Thing: How Mozilla Does Security and What You Can Steal
The Most Important Thing: How Mozilla Does Security and What You Can StealThe Most Important Thing: How Mozilla Does Security and What You Can Steal
The Most Important Thing: How Mozilla Does Security and What You Can Steal
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates You
 
Chaos engineering intro
Chaos engineering introChaos engineering intro
Chaos engineering intro
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
 
Building a culture where software projects get done
Building a culture where software projects get doneBuilding a culture where software projects get done
Building a culture where software projects get done
 
Blend it up - leancamp london presentation
Blend it up - leancamp london presentationBlend it up - leancamp london presentation
Blend it up - leancamp london presentation
 

Similar to Applying SRE techniques to micro service design

Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routinePeter Varhol
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesJonathan Creasy
 
Reliability Engineering Q&A - LCE
Reliability Engineering Q&A - LCEReliability Engineering Q&A - LCE
Reliability Engineering Q&A - LCEJan van Rooyen
 
Evil Tester's Guide to Agile Testing
Evil Tester's Guide to Agile TestingEvil Tester's Guide to Agile Testing
Evil Tester's Guide to Agile TestingAlan Richardson
 
Reanimating DevOps to Build Things that Work
Reanimating DevOps to Build Things that WorkReanimating DevOps to Build Things that Work
Reanimating DevOps to Build Things that WorkDevOpsDays Baltimore
 
High Reliabilty Systems
High Reliabilty SystemsHigh Reliabilty Systems
High Reliabilty SystemsLloydMoore
 
01. foundamentals of testing
01. foundamentals of testing01. foundamentals of testing
01. foundamentals of testingTricia Karina
 
Startup Operating Systems
Startup Operating SystemsStartup Operating Systems
Startup Operating SystemsDean Haritos
 
Mucon microservices and innovation
Mucon microservices and innovationMucon microservices and innovation
Mucon microservices and innovationGawain Hammond
 
CS5032 Lecture 5: Human Error 1
CS5032 Lecture 5: Human Error 1CS5032 Lecture 5: Human Error 1
CS5032 Lecture 5: Human Error 1John Rooksby
 
No more excuses QASymphony
No more excuses QASymphonyNo more excuses QASymphony
No more excuses QASymphonyQASymphony
 
Advanced Maintenance And Reliability (Maintenance and Reliability Best Pract...
Advanced Maintenance And Reliability  (Maintenance and Reliability Best Pract...Advanced Maintenance And Reliability  (Maintenance and Reliability Best Pract...
Advanced Maintenance And Reliability (Maintenance and Reliability Best Pract...Ricky Smith CMRP, CMRT
 
Introduction to Software Engineering and Software Process Models
Introduction to Software Engineering and Software Process ModelsIntroduction to Software Engineering and Software Process Models
Introduction to Software Engineering and Software Process Modelssantoshkawade5
 
Adapting Scrum in an Organization with Tailored Processes
Adapting Scrum in an Organization with Tailored ProcessesAdapting Scrum in an Organization with Tailored Processes
Adapting Scrum in an Organization with Tailored ProcessesPrabhat Sinha
 
A real-life overview of Agile and Scrum
A real-life overview of Agile and ScrumA real-life overview of Agile and Scrum
A real-life overview of Agile and Scrummtoppa
 
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptxSOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptxFinancial Services Innovators
 

Similar to Applying SRE techniques to micro service design (20)

Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
Working Effectively with PeopleSoft Support
Working Effectively with PeopleSoft SupportWorking Effectively with PeopleSoft Support
Working Effectively with PeopleSoft Support
 
Orchestration, the conductor's score
Orchestration, the conductor's scoreOrchestration, the conductor's score
Orchestration, the conductor's score
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeries
 
Reliability Engineering Q&A - LCE
Reliability Engineering Q&A - LCEReliability Engineering Q&A - LCE
Reliability Engineering Q&A - LCE
 
Evil Tester's Guide to Agile Testing
Evil Tester's Guide to Agile TestingEvil Tester's Guide to Agile Testing
Evil Tester's Guide to Agile Testing
 
Reanimating DevOps to Build Things that Work
Reanimating DevOps to Build Things that WorkReanimating DevOps to Build Things that Work
Reanimating DevOps to Build Things that Work
 
High Reliabilty Systems
High Reliabilty SystemsHigh Reliabilty Systems
High Reliabilty Systems
 
01. foundamentals of testing
01. foundamentals of testing01. foundamentals of testing
01. foundamentals of testing
 
Startup Operating Systems
Startup Operating SystemsStartup Operating Systems
Startup Operating Systems
 
Mucon microservices and innovation
Mucon microservices and innovationMucon microservices and innovation
Mucon microservices and innovation
 
CS5032 Lecture 5: Human Error 1
CS5032 Lecture 5: Human Error 1CS5032 Lecture 5: Human Error 1
CS5032 Lecture 5: Human Error 1
 
No more excuses QASymphony
No more excuses QASymphonyNo more excuses QASymphony
No more excuses QASymphony
 
Advanced Maintenance And Reliability (Maintenance and Reliability Best Pract...
Advanced Maintenance And Reliability  (Maintenance and Reliability Best Pract...Advanced Maintenance And Reliability  (Maintenance and Reliability Best Pract...
Advanced Maintenance And Reliability (Maintenance and Reliability Best Pract...
 
Design testabilty
Design testabiltyDesign testabilty
Design testabilty
 
Introduction to Software Engineering and Software Process Models
Introduction to Software Engineering and Software Process ModelsIntroduction to Software Engineering and Software Process Models
Introduction to Software Engineering and Software Process Models
 
Adapting Scrum in an Organization with Tailored Processes
Adapting Scrum in an Organization with Tailored ProcessesAdapting Scrum in an Organization with Tailored Processes
Adapting Scrum in an Organization with Tailored Processes
 
A real-life overview of Agile and Scrum
A real-life overview of Agile and ScrumA real-life overview of Agile and Scrum
A real-life overview of Agile and Scrum
 
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptxSOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
SOFTWARE TESTING TRAFUNDAMENTALS OF SOFTWARE TESTING.pptx
 

More from Theo Schlossnagle

More from Theo Schlossnagle (20)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Project reality
Project realityProject reality
Project reality
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Omnios and unix
Omnios and unixOmnios and unix
Omnios and unix
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
Atldevops
AtldevopsAtldevops
Atldevops
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
 
What's in a number?
What's in a number?What's in a number?
What's in a number?
 

Recently uploaded

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

Applying SRE techniques to micro service design

  • 1. can be applied to the nascent world of microservices. Put some SRE in your microservices Hard-won lessons from the world of SRE…
  • 2. The many faces of Theo Schlossnagle @postwait CEO Circonus
  • 3. The nature of the problem Software Sucks Once you’ve run software at scale, you have a deep understanding of how it is all tied together with loose string and hope.
  • 4. All software will fail, but good software fails well • Consider the phrase: “have you used X in anger.”
  • 5. Never undervalue grace in failure. Rule . 𝛌1 Crash landings should be both fast and controlled.
  • 6. What it means to fail quickly & safely • The scope of failure should collapse completely. • The time to failure should be measured in small multiples of normal service time • Nothing outside the scope of failure should be impacted. https://www.youtube.com/watch?v=5SL1A2d2e7M
  • 7. Autopsies: not just for medicine. Rule . 𝛌2 Post-mortems are fundamental.
  • 8. Pragmatic analysis is required to understand failure’s true nature • Post-mortem analysis is critical • Stack traces • Forensic logs • Images (cores, dumps, etc.)
  • 9. The difference between a shock and electrocution is real. Rule . 𝛌3 Use circuit breakers.
  • 10. Circuit breakers are designed to avoid cascading failure • it’s not all about, especially with microservices • protect yourselves and others • circuit breakers of many type • timing • queue depth • concurrency http://melissaomarkham.com
  • 11. You cannot understand what you cannot measure. Rule . 𝛌4 Behavior is complex. Understand it.
  • 12. Don’t measure to assess availability measure to understand Build robust models of behavior Understand performance changes Don’t use averages Don’t use percentiles alone
  • 13. Don’t measure to assess availability measure to understand Build robust models of behavior Understand performance changes Don’t use averages Don’t use percentiles alone
  • 14. It’s easy to demand perfection; it’s also stupid. Rule . 𝛌5 Have an failure budget.
  • 15. Avoid failure is simply impossible, expect and manage failure • use failure budgets • set expectations reasonably • define and reward successes on improvement and competency, not just uptime.
  • 16. Justice should be blind; operations should not. Rule . 𝛌6 Instrumentation & Observability have no equals.
  • 17. For every “I wonder what X is right now?” in production, you must have answers DTrace eBPF Instrument code for observability https://www.pinterest.com/pin/441775044670412234/