It is impossible to overstate how much I’ve learned since co-authoring The Phoenix Project, DevOps Handbook, and Accelerate. I’m so excited that after years of work, The Unicorn Project will be published later this year.
This book is my attempt to frame what I’ve learned studying technology leaders adopting DevOps principles and patterns in large, complex organizations, often having to fight deeply entrenched orthodoxies. And yet, despite huge obstacles, they create incredibly effective and innovative teams that create beacons of greatness that inspire us all.
In this book, we follow a senior lead developer and architect as she is exiled to the Phoenix Project, to the horror of her friends and colleagues, as punishment for contributing to a payroll outage. She tries to survive in what feels like a heartless and uncaring bureaucracy, forced to work within a system where no one can get anything done without endless committees, paperwork, change requests, and approvals. Decades of technical debt make even small changes difficult or impossible, often causing catastrophic outcomes and fear of punishment.
I get tremendous delight and gratification that this book is not about the bridge crew of the Starship Enterprise -- instead, it is about redshirt engineers, which as it turns out, whose heroic work matters most to the long-term survival of almost every organization.
In my previous books, I’ve focused on principles and practices (e.g., Three Ways, Four Types of Work). However, I’ve always wanted to describe the spectrum of cultural, experiential and value decisions we make that either enable greatness, or create chronic suffering and underperformance. They are currently as follows:
• The First Ideal — Locality and Simplicity
• The Second Ideal — Focus, Flow and Joy
• The Third Ideal — Improvement of Daily Work
• The Fourth Ideal — Psychological Safety
• The Fifth Ideal — Customer Focus
In this talk, I’ll share with you my goals and aspirations for The Unicorn Project, describe in detail the Five Ideals, along with my favorite case studies of both ideal and non-ideal, and why I believe more than ever that DevOps will be one of the most potent economic forces for decades to come.
2. @RealGeneKim
My Definition of DevOps
The architecture, technical practices, and cultural norms
that enable us to…
increase our ability to deliver applications and services...
quickly and safely, which enables rapid experimentation
and innovation, and the fastest delivery of value to our
customers…
while ensuring world-class security, reliability, and stability...
…so that we can win in the marketplace.
14. @RealGeneKim
The Problems That Still Remain
Absence of all the invisible structures needed to
enable developer productivity
The orthogonal problem of getting data from
where it resides to where it needs to be used
Strong opposition to support new ways of
working
Ambiguity on what behaviors needed to support
during a transformation
15. @RealGeneKim
The Five Ideals
1. Locality and Simplicity
2. Focus, Flow, and Joy
3. Improvement of Daily Work
4. Psychological Safety
5. Customer Focus
17. @RealGeneKim
Elite Low Difference
Deployment Frequency
On-demand
(multiple times per day)
Monthly or quarterly 208x
Deployment Lead Time < 1 hour 1 day to 1 week 2,555x
Deploy Success Rate 0-15% 46-60% 7x
Mean Time to Restore < 1 hour 1 week to 1 month 2,604x
Elite vs. Low Performers
Source: Google/DORA: 2019 State Of DevOps Report: https://cloud.google.com/devops/state-of-devops/
18. @RealGeneKim
Elite Low Difference
Deployment Frequency
On-demand
(multiple times per day)
Monthly or quarterly 208x
Deployment Lead Time < 1 hour 1 week to 1 month 106x
Deploy Success Rate 0-15% 46-60% 7x
Mean Time to Restore < 1 hour 1 week to 1 month 2,604x
Elite vs. Low Performers
Source: Google/DORA: 2019 State Of DevOps Report: https://cloud.google.com/devops/state-of-devops/
19. @RealGeneKim
Elite Low Difference
Deployment Frequency
On-demand
(multiple times per day)
Monthly or quarterly 208x
Deployment Lead Time < 1 hour 1 week to 1 month 106x
Deploy Success Rate 0-15% 46-60% 7x
Mean Time to Restore < 1 hour 1 week to 1 month 2,604x
Elite vs. Low Performers
Source: Google/DORA: 2019 State Of DevOps Report: https://cloud.google.com/devops/state-of-devops/
20. @RealGeneKim
Elite Low Difference
Deployment Frequency
On-demand
(multiple times per day)
Monthly or quarterly 208x
Deployment Lead Time < 1 hour 1 week to 1 month 106x
Deploy Success Rate 0-15% 46-60% 7x
Mean Time to Restore < 1 hour Less than one day 2,604x
Elite vs. Low Performers
Source: Google/DORA: 2019 State Of DevOps Report: https://cloud.google.com/devops/state-of-devops/
21. @RealGeneKim
High Performers Are More Secure And
Controlled
2x 29%
less time spent
remediating
security issues
more time spent
on new work
Source: Google/DORA: 2018 State Of DevOps Report: https://cloudplatformonline.com/2018-state-of-devops.html
22. @RealGeneKim
High Performers Win In The Marketplace
2x 2xmore likely to
exceed profitability,
market share &
productivity goals
more likely to achieve
organizational and
mission goals, customer
satisfaction, quantity &
quality goals
Source: Google/DORA: 2018 State Of DevOps Report: https://cloudplatformonline.com/2018-state-of-devops.html
23. @RealGeneKim
High Performers Win In The Marketplace
2.2xhigher employee
Net Promoter Score
50%higher market
capitalization growth
over 3 years*
Source: Google/DORA: 2018 State Of DevOps Report: https://cloudplatformonline.com/2018-state-of-devops.html
25. @RealGeneKim
When we can safely, quickly,
reliably, securely achieve
all the goals, dreams and
aspirations of our business…
26. @RealGeneKim
The Five Ideals
1. Locality and Simplicity
2. Focus, Flow, and Joy
3. Improvement of Daily Work
4. Psychological Safety
5. Customer Focus
28. @RealGeneKim
The Birth And Death Of Etsy Sprouter
A story about teams of engineers implementing
changes
2008: Devs and DBAs
2009: Devs and DBAs and Sprouter team
2010: Devs
31. @RealGeneKim
Architecture Enables Teams To…
…make large scale changes to the design of its system without the
permission of someone outside the team, or depending on other
teams
...complete its work without fine-grained communication and
coordination with people outside the team
...deploy and release its product or service on demand, independently
of other services the product or service depends upon
...do most of its testing on demand, without requiring an integrated
test environment
...perform deployments during normal business hours with negligible
downtime
Source: Puppet/DORA: 2017 State Of DevOps Report: https://puppet.com/resources/whitepaper/state-of-devops-report
33. @RealGeneKim
How Many People Do You Need To Feed?
Two pizza team
Feeding everyone in the building
Schedule lunch with 43 different people
34. @RealGeneKim
The First Ideal: Code
Ideal: anyone can implement what they need by
looking at one file or module, and make the
needed change
Kubernetes sidecars
Spring (http-retry, Dependency Injection)
Aspect Oriented Programming
Not Ideal: to make your needed change, you
have to understand and change all the files and
modules
35. @RealGeneKim
The First Ideal: Code
Ideal: changes can be independently
implemented and tested, isolated from other
components (composability)
Not Ideal: in order for changes to be
implemented and tested, the entire system must
be present (e.g., integrated test environment)
36. @RealGeneKim
The First Ideal: Organization
Ideal: every team has the expertise, capability
and authority to satisfy customer needs
Not Ideal: in order to satisfy customer needs,
every team must escalate up two levels (and over
two, and down two)
39. @RealGeneKim
Team of Teams
Story of Joint Special Forces
Task Force battling a smaller,
nimbler adversary in Iraq in
2004
Pushing decision making to
the edges
40. @RealGeneKim
The First Ideal: Data
Ideal: every team has access to the data they
need, on-demand, quickly, accurately, and
securely
Not Ideal: in order to get the data they need,
teams must wait months, and hope that every
report won’t break
43. @RealGeneKim
As Your Ambassador From Dev
For decades, I self-identified as an Ops person…
2 years ago, I’ve started to self-identify as Dev
Clojure / ClojureScript
LISP, functional programming, immutability
3000 lines of Objective C -> 1500 lines of TypeScript/React -
> 500 lines of ClojureScript
Development is so fun, and these days, you can do
miraculous things with so little effort
44. @RealGeneKim
Why Functional Programming
The famous French philosopher Claude Lévi-Strauss
would say of certain tools, ‘is it good to think with?’
Core FP concepts
Immutability
Pure functions
Composability
Pioneered by Haskell and Ocaml. Popularized by
Clojure, Erlang, Elm, Elixir, ReasonML
46. @RealGeneKim
The Second Ideal: Focus and Flow
Ideal: your energy and time is focused on solving
the business problem, and you’re having fun
Not Ideal: all your time is spent trying to solve
problems you don’t even want to solve (e.g.,
YAML files, Makefile and spaces in filenames,
bash)
47. @RealGeneKim
Never Have I Valued Infrastructure More
Things I detest now
Everything outside of my application
Connecting to anything to anything
Updating dependencies
Secrets management
Bash
YAML
Patching
Building kubernetes deployment files (mostly by Googling)
Why my cloud costs are so high
48. @RealGeneKim
The Value Of Platforms
Enable developer productivity
Self-service
On-demand
Immediacy and fast feedback
Focus and flow
Joy
Monitoring, deployment, environment creation,
security scans, orchestration…
52. @RealGeneKim
“What is your lead time
for changes?”
“How long does it take to go from
code committed to code successfully
running in production?”
53. @RealGeneKimSource: The DevOps Handbook
Product Design and Development Product Delivery
(Build, Test, Deploy)
Create new products and services that solve
customer problems using hypothesis-driven
delivery, modern UX, design thinking
Enable fast flow from development to
production and reliable releases by
standardizing work, reducing variability and
batch sizes
Feature design and implementation may
require work that has never been done before
Integration, test and deployment must be
performed continuously, as quickly as possible
Estimates are highly uncertain
Cycle times should be well-known and
predictable
Outcomes are highly variable Outcomes should have low variability
Change Committed Into Version Control
54. @RealGeneKimSource: The DevOps Handbook
Product Design and Development Product Delivery
(Build, Test, Deploy)
Create new products and services that solve
customer problems using hypothesis-driven
delivery, modern UX, design thinking
Enable fast flow from development to
production and reliable releases by
standardizing work, reducing variability and
batch sizes
Feature design and implementation may
require work that has never been done before
Integration, test and deployment must be
performed continuously, as quickly as possible
Estimates are highly uncertain
Cycle times should be well-known and
predictable
Outcomes are highly variable Outcomes should have low variability
Change Committed Into Version Control
55. @RealGeneKimSource: The DevOps Handbook
Change Committed Into Version Control
Product Design and Development Product Delivery
(Build, Test, Deploy)
Create new products and services that solve
customer problems using hypothesis-driven
delivery, modern UX, design thinking
Enable fast flow from development to
production and reliable releases by
standardizing work, reducing variability and
batch sizes
Feature design and implementation may
require work that has never been done before
Integration, test and deployment must be
performed continuously, as quickly as possible
Estimates are highly uncertain
Cycle times should be well-known and
predictable
Outcomes are highly variable Outcomes should have low variability
56. @RealGeneKim
The Second Ideal: Focus and Flow
Ideal: when you can implement and test your
feature on your Dev laptop, and learn whether it
worked in seconds
Not Ideal: when the only way you can determine
whether you feature worked is waiting minutes,
hours, or days… or weeks…
57. @RealGeneKim
The Second Ideal: Focus and Flow
Ideal: trunk based development
Not Ideal: 5 days merging, 50 people in
conference rooms
60. @RealGeneKim
Not Ideal
“In manufacturing, the absence of effective feedback often
contribute to major quality and safety problems. In one well-
documented case at the General Motors Fremont manufacturing
plant, there were no effective procedures in place to detect
problems during the assembly process, nor were there explicit
procedures on what to do when problems were found.
“As a result, there were instances of engines being put in
backward, cars missing steering wheels or tires, and cars even
having to be towed off the assembly line because they wouldn’t
start.”
Source: DevOps Handbook
61. @RealGeneKim
Create as much feedback in our system, from as
many areas in our system, sooner, faster, and
cheaper, with as much clarity between cause and
effect.
Why? Because the more assumptions we can
invalidate, the more we learn, improving our ability
to fix problems and innovate.
Source: DevOps Handbook
Ideal
63. @RealGeneKim
How many times per day is the andon cord
pulled in a typical day at a Toyota
manufacturing plant?
3,500 times per day
Source: http://www.gembapantarei.com/2008/04/how_many_times_do_you_pull_the_andon_cord_each_day.html
66. @RealGeneKim
Fast Push To Market — Continued
Features
Defects
Defect fixing dominates work
Site reliability tanks
Slower and slower velocity
Customers leave
Morale plunges
Devs leave because everything is hard
Quality
Debts & Risks
69. @RealGeneKim
Near Death Experiences
● Ebay (1999)
● Microsoft (2002): Bill Gates memo
● Google (2005): Automated testing culture
● Amazon (2004): Jeff Bezos memo
● Twitter (2008)
● LinkedIn (2009)
● Etsy (2009)
70. @RealGeneKim
2002 Microsoft Security
Standdown
Famously, Microsoft after
SQL Slammer required
every product group to
freeze feature
Source: https://www.wired.com/2002/01/bill-gates-trustworthy-computing/
73. @RealGeneKim
Quote from Marty Cagan from his book
Inspired
The deal [between product owners and] engineering goes like this: Product
management takes 20% of the team’s capacity right off the top and gives this to
engineering to spend as they see fit. They might use it to rewrite, re-architect, or
re-factor problematic parts of the code base…whatever they believe is necessary
to avoid ever having to come to the team and say, ‘we need to stop and rewrite [all
our code].’ If you’re in really bad shape today, you might need to make this 30% or
even more of the resources. However, I get nervous when I find teams that think
they can get away with much less than 20%.
Cagan notes that when organizations do not pay their “20% tax,” technical debt
will increase to the point where an organization inevitably spends all of its cycles
paying down technical debt. At some point, the services become so fragile that
feature delivery grinds to a halt because all the engineers are working on reliability
issues or working around problems.
74. @RealGeneKim
The Third Ideal: Enabling Greatness
Ideal: 3-5% of developers dedicated to improving
developer productivity
Google: likely 1,500+ devs ($1B+)
Microsoft: likely over 3,000 devs
Not ideal: assigned to summer interns and
“people not good enough to be developers”
76. @RealGeneKim
Breaking The Bottlenecks In The Flow
Environment creation
Code deployment
Test setup and run (mention @rohansingh)
Overly tight architecture
Development
Product management
77. @RealGeneKim
"Automated tests transform fear into boredom."
-- Eran Messeri, Google
Google Dev And Ops (2013)
15,000 engineers, working on 4,000+ projects
All code is checked into one source tree
(billions of files!)
5,500 code commits/day
75 million test cases are run daily
79. @RealGeneKim
The Third Ideal: Improvement
Not Ideal: No one cares if someone breaks the
build, or checks in code that breaks our tests
Ideal: When someone breaks our build or our
tests, fixing it becomes the most important work
of the moment
80. @RealGeneKim
The Third Ideal: Improvement
Not ideal: When someone needs a peer review,
that person has to wait until someone else frees
up
Ideal: Whatever I’m working on, if someone
needs a peer review, I drop whatever I’m doing to
help
83. @RealGeneKim
DevOps Enterprise: Lessons Learned
In 2019, we’ll hold the sixth year of the DevOps Enterprise Summit, a conference for
horses, by horses
Over the years, we’ve had nearly 350 leaders from:
Capital One, KeyBank, Barclays, GE Capital, ING Bank, Fidelity, PNC, ADP, BofA, Western
Union, BBVA
Nationwide Insurance, Zurich Insurance, Allstate, Hiscox, Aviva, LV=
Walmart, Nordstrom, Target, Macy’s, Marks and Spencer
Nike, Adidas, Sherwin Williams
Verizon, Telstra, T-Mobile, Orange, CSG
Raytheon, Lockheed Martin, Northrop Grumman, CSRA, Jaguar Land Rover, Fiat/Chrysler,
Cisco
Disney, Ticketmaster, NBC/Universal, Comcast
Kaiser Permanente
US Citizenship & Immigration Services, UK HM Revenue Collection, DISA Forge.mil, NZ
Ministry of Social Development, UK Welfare and Pensions, US Joint Warfare Analysis Center
Amazon PrimeNow, CA, Compuware, Google Search, IBM, MicroFocus, Microsoft, SAP
91. @RealGeneKim
Modeling Continual Learning
“When adult learners start trying to learn a new
skill, they will often do it in private, because of the
embarrassment associated with doing something
they’re not good at.”
We can help by saying “I don’t know"
93. @RealGeneKim
The Fifth Ideal: Focus On The Customer
Not ideal: Functional silo managers prioritize silo
goals over business goals
Ideal: Functional silo managers make decisions
based on what the customer values, and helps
ensure their teams have the skills to thrive in the
long term
97. @RealGeneKim
The Five Ideals
1. Locality and Simplicity
2. Focus, Flow, and Joy
3. Improvement of Daily Work
4. Psychological Safety
5. Customer Focus
98. @RealGeneKim
Want More Learn More?
To receive this presentation and the following:
PDF and audio excerpts from The Unicorn Project
Eight excerpts from Beyond The Phoenix Project audio
series w/John Willis
The 140 page excerpt of The DevOps Handbook
The 140 page excerpt of The Phoenix Project
Videos and slides from DevOps Enterprise 2014-2019
One hour excerpt of The Phoenix Project audiobook
Just pick up your phone, and send an email:
To: realgenekim@SendYourSlides.com
Subject: devops
realgenekim@SendYourSlides.com
devops
Editor's Notes
[ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
Who are they auditing? IT operations.
I love IT operatoins. Why? Because when the developers screw up, the only people who can save the day are the IT operations people.
Memory leak? No problem, we’ll do hourly reboots until you figure that out.
Who here is from IT operations?
Bad day:
Not as prepared for the audit as they thought
Spending 30% of their time scrambling, generating presentation for auditors
Or an outage, and the developer is adamant that they didn’t make the change – they’re saying, “it must be the security guys – they’re always causing outages”
Or, there’s 50 systems behind the load balancer, and six systems are acting funny – what different, and who made them different
Or every server is like a snowflake, each having their own personality
We as Tripwire practitioners can help them make sure changes are made visible, authorized, deployed completely and accurately, find differences
Create and enforce a culture of change management and causality
Source: Flickr: birdsandanchors
Source: RyanJLane
Bus factor is the number of people that need to be hit by a bus before your project comes to a screeching halt.
In TPP, we had bus factor of 1. Brent. Because every outage required Brent, and every major work item required Brent. If Brent got hit by a bus, the company was legitimately at risk of going out of the business.
In the Unicorn Project, I love the concept of the lunch factor.
How many people do you need to take out to lunch.
Amazon has the notion of a two pizza team. No team should be large than can be fed by two pizza. They can indedepently develop, test, and deploy value to the customer.
No need to take anyone out to lunch.
However, in most organizations, to make a small change, everything is so tightly coupled together, you have to take everyone out to lunch.
It’s not two pizza, it’s multiple truckloads of pizzas.
April 22, 2011
We used the most powerful analytical tool to generate this graph: not SPSS, R, Tableau, PLA Sim. We used pivot tables in Excel.
Book is redshirts from Star Trek, A Team, Hogans Heros, and the movie Brazil
20 years: self identified as an Ops person
Book is redshirts from Star Trek, A Team, Hogans Heros, and the movie Brazil
20 years: self identified as an Ops person