Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2015 IBM Corporation
Sanjeev Sharma
CTO – DevOps Adoption
IBM Distinguished Engineer
@sd_architect | sdarchitect.blog
Fr...
2Page© 2016 IBM Corporation
#WhoAmI
• 20+ Years in Software
Development and Delivery
• IBM Distinguished Engineer and
CTO ...
3Page© 2016 IBM Corporation
What is SRE?
“SRE is what happens
when you ask a software
engineer to design an
operations tea...
4Page© 2016 IBM Corporation
Apollo 13 – The real heroes
Image Courtesy:
Universal Pictures, NASA
5Page© 2016 IBM Corporation
Reliability: The Real Availability Numbers!
How much downtime does 5-nines 99.999% availabilit...
6Page© 2016 IBM Corporation
Eight Tenets of Google SRE
1. Ensuring a Durable Focus on Engineering
2. Pursuing Maximum Chan...
7Page© 2016 IBM Corporation
Best Practices of Incident Management
1. Prioritize
2. Prepare
3. Trust
4. Introspect
5. Consi...
8Page© 2016 IBM Corporation
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy
Developmen...
9Page© 2016 IBM Corporation
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy
Developmen...
10Page© 2016 IBM Corporation
Development SCM Build Package
Repo
Deploy
Development SCM Build Package
Repo
Deploy
Developme...
11Page© 2016 IBM Corporation
Your Delivery Pipeline
will be as fast as the
slowest Delivery
Pipeline it is
dependent on
Ar...
12Page© 2016 IBM Corporation
Modernizing to
Microservices based
Architecture:
Refactoring Code
and Data and
defining REST ...
13Page© 2016 IBM Corporation
Developers are paid
to write code, not
maintain deployment
and configuration
scripts
Applicat...
14Page© 2016 IBM Corporation
If you are doing 2-
week Sprints, but it
takes 3-weeks to
get a Test Server,
how long are you...
15Page© 2016 IBM Corporation
It is not possible to
patch the software of
a missile AFTER it
has been launched
Release Mana...
16Page© 2016 IBM Corporation
Shift thinking from
Mean Time Between
Failure (MTBF) to
Mean Time To
Repair (MTTR).
Operation...
17Page© 2016 IBM Corporation
MTTR Calculus
Mean Time to Repair =
Mean Time to Detect + Mean Time to Triage +
Mean Time to ...
18Page© 2016 IBM Corporation
Antifragile Systems
Antifragile: Things that are
neither fragile or robust,
but rather thrive...
19Page© 2016 IBM Corporation
Delivering Antifragile Systems
Servers may go “red,”
services are always
“green”
Cattle not p...
20Page© 2016 IBM Corporation
Organizational Change
• “Everyone is responsible for
Delivering to Production”
• Squad-Tribe-...
21Page© 2016 IBM Corporation
When DevOps meets SRE
DevOps: “Everyone is responsible for
delivery to production.”
SRE: “(Ev...
© 2015 IBM Corporation
Any questions?
THANK YOU
@sd_architect
http://sdarchitect.blog
Upcoming SlideShare
Loading in …5
×

From Apollo 13 to Google SRE

The complexity of managing and delivering the high level of reliability expected of web-based, cloud hosted systems today, and the expectation of Continuous Delivery of new features has led to the evolution of a totally new field of Service Reliability Engineering catered for such systems. Google, who has been a pioneer in this field, calls it Site Reliability Engineering (SRE). While it would be more aptly named Service Reliability Engineering, the name has caught on. The seminal work documenting Google approach and practices is in the book by Google by the same name (commonly referred to as the ‘SRE book’), and has become the defacto standard on how to adopt SRE in an organization. This session will cover adopting SRE as a practice in organizations also adopting DevOps; address the challenges to adopting SRE faced by large traditional enterprises, and how to overcome them.

From Apollo 13 to Google SRE

  1. 1. © 2015 IBM Corporation Sanjeev Sharma CTO – DevOps Adoption IBM Distinguished Engineer @sd_architect | sdarchitect.blog From Apollo 13 to Google SRE When DevOps met SRE
  2. 2. 2Page© 2016 IBM Corporation #WhoAmI • 20+ Years in Software Development and Delivery • IBM Distinguished Engineer and CTO for DevOps Adoption • Author of two DevOps books: • DevOps For Dummies: https://ibm.biz/BdsPMX • The DevOps Adoption Playbook: http://amzn.to/2hH7rt2 • Blog: https://sdarchitect.blog • @sd_architect
  3. 3. 3Page© 2016 IBM Corporation What is SRE? “SRE is what happens when you ask a software engineer to design an operations team. ” - Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. “Site Reliability Engineering.” Site Reliability Engineering (SRE) : Google’s approach to Service Management
  4. 4. 4Page© 2016 IBM Corporation Apollo 13 – The real heroes Image Courtesy: Universal Pictures, NASA
  5. 5. 5Page© 2016 IBM Corporation Reliability: The Real Availability Numbers! How much downtime does 5-nines 99.999% availability translate to? • Daily: 0.9s • Weekly: 6.0s • Monthly: 26.3s • Yearly: 5m 15.6s 4-nines or 99.99% translates to downtime of: • Daily: 8.6s • Weekly: 1m 0.5s • Monthly: 4m 23.0s • Yearly: 52m 35.7s Even the more common 99.95% availability SLO is a mere 43 seconds/day or 5:24 minutes/week.
  6. 6. 6Page© 2016 IBM Corporation Eight Tenets of Google SRE 1. Ensuring a Durable Focus on Engineering 2. Pursuing Maximum Change Velocity Without Violating a Service’s SLO 3. Monitoring 4. Emergency Response 5. Change Management 6. Demand Forecasting and Capacity Planning 7. Provisioning 8. Efficiency and Performance
  7. 7. 7Page© 2016 IBM Corporation Best Practices of Incident Management 1. Prioritize 2. Prepare 3. Trust 4. Introspect 5. Consider alternatives 6. Practice 7. Change it around Image Courtesy: Universal Pictures, NASA
  8. 8. 8Page© 2016 IBM Corporation Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Test Stage Production Mainframe Hosted App Mobile App App Server Monolithic App Cloud Native App Enterprise Release Agile/Innovation Edge Rapid Delivery for Innovation • Agile • Antifragile • Experimentation • New and Innovative • Hybrid Cloud • IaaS/PaaS • Containers Industrialized Core Deliver at regular cadence • Agile • Stability • Predictability • Lean Delivery pipeline • Core and Legacy Systems Hybrid Infrastructure – Physical, Cloud • IaaS/PaaS • Containers Business Capability DevOps + SRE in the Enterprise Balancing Innovation and Optimization
  9. 9. 9Page© 2016 IBM Corporation Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Test Stage Production Application N Application C Application B Application A Enterprise Release Agile/Innovation Edge Rapid Delivery for Innovation • Agile • Antifragile • Experimentation • New and Innovative • Hybrid Cloud • IaaS/PaaS • Containers Industrialized Core Deliver at regular cadence • Agile • Stability • Predictability • Lean Delivery pipeline • Core and Legacy Systems Hybrid Infrastructure – Physical, Cloud • IaaS/PaaS • Containers Business Capability Touchpoints of Standardization Across Delivery Pipelines Deployment Automation and Orchestration Service and Test Environment Virtualization APIs Planning and Architecture Release Management Operational Readiness
  10. 10. 10Page© 2016 IBM Corporation Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Development SCM Build Package Repo Deploy Test Stage Production Application N Application C Application B Application A Enterprise Release Agile/Innovation Edge Rapid Delivery for Innovation • Agile • Antifragile • Experimentation • New and Innovative • Hybrid Cloud • IaaS/PaaS • Containers Industrialized Core Deliver at regular cadence • Agile • Stability • Predictability • Lean Delivery pipeline • Core and Legacy Systems Hybrid Infrastructure – Physical, Cloud • IaaS/PaaS • Containers Business Capability When DevOps met SRE Deployment Automation and Orchestration Service and Test Environment Virtualization APIs Planning and Architecture Release Management Operational Readiness DevOps SRE
  11. 11. 11Page© 2016 IBM Corporation Your Delivery Pipeline will be as fast as the slowest Delivery Pipeline it is dependent on Architecture and Planning
  12. 12. 12Page© 2016 IBM Corporation Modernizing to Microservices based Architecture: Refactoring Code and Data and defining REST APIs APIs
  13. 13. 13Page© 2016 IBM Corporation Developers are paid to write code, not maintain deployment and configuration scripts Application Deployment and Environment Orchestration
  14. 14. 14Page© 2016 IBM Corporation If you are doing 2- week Sprints, but it takes 3-weeks to get a Test Server, how long are your Sprints? Test Service and Environment Virtualization
  15. 15. 15Page© 2016 IBM Corporation It is not possible to patch the software of a missile AFTER it has been launched Release Management
  16. 16. 16Page© 2016 IBM Corporation Shift thinking from Mean Time Between Failure (MTBF) to Mean Time To Repair (MTTR). Operational Readiness for SRE
  17. 17. 17Page© 2016 IBM Corporation MTTR Calculus Mean Time to Repair = Mean Time to Detect + Mean Time to Triage + Mean Time to Restore + Mean Time to Pass Blame…
  18. 18. 18Page© 2016 IBM Corporation Antifragile Systems Antifragile: Things that are neither fragile or robust, but rather thrive in chaos.
  19. 19. 19Page© 2016 IBM Corporation Delivering Antifragile Systems Servers may go “red,” services are always “green” Cattle not pets Fragility in systems actually comes from a desire to make them too robust.
  20. 20. 20Page© 2016 IBM Corporation Organizational Change • “Everyone is responsible for Delivering to Production” • Squad-Tribe-Guild Team Model • SRE Squads • A Learning Organization
  21. 21. 21Page© 2016 IBM Corporation When DevOps meets SRE DevOps: “Everyone is responsible for delivery to production.” SRE: “(Everyone) is responsible for delivering Continuous Business Value”
  22. 22. © 2015 IBM Corporation Any questions? THANK YOU @sd_architect http://sdarchitect.blog

×