Turning Human Capital into High Performance Organizational Capital

Devops: Turning Human
Capital into High Performance
Organizational Capital
John Willis
@botchagalupe

• One of the founding members of “Devopsdays”
• Co-author of the “Devops Handbook”.
• Author of the “Introduction to Devops” on Linux Foundation edX.
• Podcaster at devopscafe.org
• Devops Enterprise Summit - Cofounder
• Nine person in at Chef (VP of Customer Enablement)
• Formally Director of Devops at Dell
• Found of Socketplane (Acquired by Docker)
• 10 Startups over 25 years
About Me
https://github.com/botchagalupe/my-presentations

How would I describe
Devops to a CEO?

How would you describe
Devops to a CEO?

The consequences of failure have never been greater…

Devops Practices and Patterns
• Continuous Delivery
• Everything in version control
• Small batch principle
• Trunk based deployments
• Manage flow (WIP)
• Automate everything 
• Culture
• Everyone is responsible
• Done means released
• Stop the line when it breaks
• Remove silos12
itrevolution.com/devops-handbook

Human Capital and
High Performance
Organizations

30x 200x
more frequent
deployments
faster lead
times
60x 168x
the change
success rate
faster mean time to
recover (MTTR)
2x 50%
more likely to
exceed profitability,
market share &
productivity goals
higher market
capitalization growth
over 3 years*
High performers compared to their peers…
Data from 2014/2015 State of DevOps Report - https://puppetlabs.com/2015-devops-report
Recent IT Performance Data is Compelling

30x 200x
more frequent
deployments
faster lead
times
60x 168x
the change
success rate
faster mean time to
recover (MTTR)
2x 50%
more likely to
exceed profitability,
market share &
productivity goals
higher market
capitalization growth
over 3 years*
High performers compared to their peers…
Data from 2014/2015 State of DevOps Report - https://puppetlabs.com/2015-devops-report
Recent IT Performance Data is Compelling
Faster
Higher 
Quality
More 
Effective
2555x

Fast
CheapGood
“Pick Two!”
Conventional Wisdom

Organizational culture was one
of the strongest predictors of
both IT performance and the
overall performance of the
organization

Devops is about Humans
19
Devops is a set of practices and
patterns that turn human
capital into high performance
organizational capital.

Google
• Over 15,000 engineers in over 40 offices
• 4,000+ projects under active development
• 5500+ code submissions per day (20+ p/m)
• Over 75M test cases run daily
• 50% of code changes monthly
• Single source tree
• Over 75M test cases run daily

Amazon
• 11.6 second mean time between deploys.
• 1079 max deploys in a single hour.
• 10,000 mean number of hosts
simultaneously receiving a deploy.
• 30,000 max number of hosts simultaneously
receiving a deploy

23
Unicorns and Horses (Enterprises)
Unicorns
Enterprise
Shamelessly stolen and repurposed from: Pete Cheslock

Enterprise Organizations
• Ticketmaster - 98% reduction in MTTR
• Nordstrom - 20% shorter Lead Time
• Target - Full Stack Deploy 3 months to minutes
• USAA - Release from 28 days to 7 days
• ING - 500 applications teams doing devops
• CSG - From 200 incidents per release to 18

Faster, Better, and Cheaper. How?

Lean
Safety Culture
Learning Organization

Service
now
Parts Unlimited - "Major Release 6"
Early 2014
Project
Initiation
ZRA
(finance)
Approve
Project
Monthly
Steering
Meeting
Portfolio
C-level
Steering Comittee
Provides
Input
Project
Charter
High-Level
• Stories
• Project Info
• Description
• Budget
• Schedule
PM
Stakeholders
(Tech and Biz)
Create Work
Breakdown
Work
Breakdown
(MS Proj)
High-Level
• Milestones
• Resource
Planning
3 months 3 monthsHold / Pause
Create
Requirements
(Project Meeting)
MS
Office
• Detailed Req for new
features
• Technology refreshes
• ERD (Infra req)
• DRD (Dev req)
• BRD (Biz req)
Share
Point
Create
Design
Tech
Req
Tech
Req
Tech
Req
Tech Leads
Architects Vendor Arch
Ops Arch
High-Level
Server
Tickets
3 months
Receive
Request for
Servers
Create
Server
Request
Spreadsheet
Server
Req
PM
Tixattach
Route for
Approval
Tix
1 week 1 week
• Budget
• Appropriate
Resources
DB
App or
Web
or
Approved Into Ops
Delivery
Queue
Delivery
Manager
"Matt"
Service
now
"Heads up"
Assign to
Delivery
Engineer
Delivery
Engineer
Clarify or
Confirm
Req with
Dev or
QA
1 - 6 weeks
Provision
Server
and
Rework
DBA
Validation
App/Web
Validation
Restore
Data
1 week
App
Team
App
Team
PM
Stakeholders
(Tech and Biz)
Dev Leads
4 weeks
ARB
Queue
Detailed
Analysis and
Requirements
Jira
"Stories"
Maybe
Track Ticket
Dependencies
Confluence
Pages
Team Leads
and PMs
Assign
Requirements
add more detail
for their teams
Architecture
Review Board
"Bill" plus
Architects
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
Ops DBA
Service Data
Setup
(Mainframe)
"Jennifer"
Test Data
Configuration
Manager
Development Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
Create
CAB ticket
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Email Notification
Jira
NewArch
Build
VMs
Jira
Ops
Service
Now
Legacy
QA Lead
PMs
QAs
End to end
testing in Prod
Prod Env
Prd
DB
Go-No Go
decision
meeting
Team Leads
Jira
Ops
By Cluster
"Remove
Feature Flag"
(if new arch)
16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Data Setup Integration Testing
DEv Arch
Create
Change
Tickets
> 100
Service
Now
Compute
Net
Facility
Cabling
Storage
"Linda"
Ops PM
RESET
DELIVERY
DATE!
Steering
Comittee
Fix
Tickets!
"Linda"
Ops PM
Dev
Leadership
Assign Dev
Team
Ops Intake
Meeting
Dev
Leadership
1 week
Group
CIOs and
Arch Leads
QA
Steering
Design
Dev Breakdown
Dev / Test
Staging Release
Server Requirements Gathering
Server Approval and Assignment
Provisioning
Production Release
Initiation and Planning
Create Ops
Tickets
TS
PD
TS PD
Gaps in Requirements
• Licenses
• Dependencies on 3rd party apps
• Capacity planning always seem low
("robbing Peter to pay Paul")
• Don't purchase in advance even though
we know it's coming
Duplicate info across
different documents
EP
D
D
Procurement of physical servers can take months (lead
times for procurement plus facilities groups)
Too many Env. in on ticket
cases audit confusion
Piecemeal requests ("2 this
week, 3 next week")
1 queue for delivery team
with ~1,000 tickets at once
Capacity issues cause delay
Often told to stop
everything and do
something else
TS
D
M
TS
M W
W TS EP
H
No monitoring or backup
for some environments
30% of delivery teams time
spent "consulting" on
performance and dealing with
unfounded requests for more
capacity
3-5 days to fix
~10% S/R
H
D M
TS
H
Often skips CAB.
What CAB reviews is
often not what built
All manual setup. 1
person really knows
how. Low data quality.
Manual process with
lots of back and forth.
Many tickets with
mismatched
priorities
Mostly
manual
testing
Manual, per
cluster
Frequently down.
External service
updates take offline.
Lots of contention.
EP
M
D
PD
M W
TS
TS D
M TS
PD
M
M
S/R - 90%
S/R - 55%
S/R - 15%
D
S/R - 20%
S/R - 50%
Sometimes submits
server requests
directly to delivery Ad-hoc requests get
lost, maybe 2-3
week delays
TS
High Level
S/R - 75%
9+ months of planning before
implementation starts
(and information / requirements still
incorrect or incomplete!)
Dev and QA told to submit sever
request 6-8 weeks in advance
(only done 50% of time)
W5. New "white
glove"
engagement
model
3. Standard product catalog
("Environments on Demand")
2. Visualization of flow of work and
expected upcoming work
4. Shorten from
Design to
Implementation
1. Fully Automated Environment Provisioning
7. Small
Batches
8. Write end-to-
end customer
func. tests
11. Resolve
interface to
legacy
10. Test data
setup
automation
13. Dev Deploy to Prod for
legacy
14. Unify
change
management
tools
15. Tool
9. Service Verification test writing: shift left to Dev
(test early)
12. Remove Bottleneck and Environment Contention
(test more)
• Make the work visibile for all
• Manage flow and eliminate waste
• Build alignment and consensus across team boundaries
• Empower teams to find and fix what is getting in the way

• Small Batch
• Reduce Work in Process (WIP)
• 1x1 Flow
• Reduce Bottlenecks (TOC)
• Optimize Globally

I fear not the man who has
practiced 10,000 kicks
once, but I fear the man
who has practiced one
kick 10,000 times
- Bruce Lee

Toyota is not a story about
techniques. It’s an organization
defined primarily by the unique
behavior routines it continually
teaches to all it’s members.
Mike Rother (Page 262-263)

Wanna see what Kata
looks like in Devops?

I have no idea
how to answer
that question. It
would literally
never occur to me
not to do it!
KATA

We are what we repeatedly
do. Excellence, then, is not
an act, but a habit.
The Dude

Improvement Kata
Coaching Kata

• Capability 1: Seeing problems as they occur
• Complex work is managed so that problems in design are revealed
• They see problems as they occur, through relentless testing of
assumptions 
• Capability 2: Swarming and solving problems as they are seen to
build new knowledge
• Problems that are seen are solved so that new knowledge is built
quickly
• Improvement of daily work is prioritized above daily work 
• Capability 3: Spreading new knowledge throughout the
organization
• The new discovery of local knowledge and improvements are
turned into global improvements, shared throughout the
organization
• Learning is fed back to prevent future failures 
• Capability 4: Leading by developing
• The job of leaders is not the command and control, but to create
other capable leaders who can perpetuate this system of work

▪ Views on Human Error
▪ The old view of human error (First Story)
▪ Human error is the cause of accidents
▪ To explain failure,you must seek failure
▪ You must find people’s: inaccurate
assessments,wrong decisions, bad judgments

▪ Views on Human Error
▪ The new view of human error (Second Story)
▪ Human error is a symptom of trouble deeper inside a
system
▪ To explain failure, do not try to find where people
went wrong
▪ Instead, find how people’s assessments and actions
made sense at the time, given the circumstances that
surrounded them

▪ Bad Apple Theory - Throw away the bad apples
▪ Complex systems are basically safe, they need to be
protected from unreliable people (bad apples)
▪ Human errors cause accidents: humans are the
dominant contributor to more than two thirds of mishaps
▪ Errors occur because of human loss of situation
awareness, complacency, negligence
▪ Errors are introduced to the system only through the
inherent unreliability of people.

What can go wrong usually goes
right, but then we draw the wrong
conclusion.
Murphy’s Law is Wrong!
Sidney Dekker
The Field Guide to Human Error

Blameless Culture
A blameless culture believes that
systems are NOT inherently safe
and humans do the best they can to
keep them running.

Thematic Vagabonding
People jump from one topic to the next,
treating all superﬁcially, in certain cases
picking up topics dealt with earlier at a
later time; they don’t go beyond the
surface with any topic and seldom ﬁnish
with any. (Dörner, 1980)

Your organization must
continually affirm that
individuals are NEVER the
‘root cause’ of outages.

▪ Awesome Postmortems - Mindweather LLC
▪ in complex systems, there is no root cause, except…
▪ there are (multiple) conditions, some of which are
unknowable, unfixable, outside our control
▪ people did what made sense at the time, given the
information they had (no counterfactuals)
▪ failure and success are both normal in complex systems
▪ getting the full account* of what happened is more
important than blame/punishment

▪ Hindsight bias:
▪ knew-it-all-along, to see the event as having been predictable,
counterfactuals
▪ Outcome bias:
▪ evaluating the quality of a decision when the outcome of that
decision is already known
▪ Availability bias:
▪ preference by decision makers to information and events that are
more recent
▪ Fundamental attribution error:
▪ explain behavior in terms of internal disposition, such as
personality traits, abilities, motives, etc. as opposed to external
situational factors

▪ Just Culture at Etsy (John Allspaw)
▪ Encourage learning by having these blameless Post-
Mortems on outages and accidents
▪ Understand how an accidents happen, in order to better
equip ourselves from it happening in the future
▪ Gather details from multiple perspectives on failures, and
we don’t punish people for making mistakes
▪ Enable and encourage people who do make mistakes to be
the experts on educating the rest of the organization how
not to make them in the future

▪ Just Culture at Etsy (John Allspaw)
▪ Accept that there is always a discretionary space where
humans can decide to make actions or not, and that the
judgement of those decisions lie in hindsight
▪ Accept that the Hindsight Bias will continue to cloud our
assessment of past events, and work hard to eliminate it
▪ Accept that the Fundamental Attribution Error is also
difficult to escape, so we focus on the environment and
circumstances people are working in when investigating
accidents

That’s how it’s
always been done
around here!

You are either building a learning
organization… or you will be losing
to someone who is
- Walter Sobchak- Andrew Clay Shafer

A learning organization is a place
where people are continually
discovering how they create their
reality.
- Peter Senge

▪ Five Disciplines must be adopted to become a
learning organization
▪ Systems Thinking
▪ Personal Mastery
▪ Mental Models
▪ Shared Vision
▪ Team Learning

Ladder of Inference
Chris Argyris
• Action
• Beliefs
• Conclusions
• Assumptions
• Meanings
• Select
• Observe

Ladder of Inference
▪ Can create bad judgement
▪ Our assumptions can lead us to bad conclusions
▪ Question your assumptions and conclusions
▪ Seek contrary data
▪ Make your assumptions visible to others
▪ Invite others to test your assumptions and conclusions
▪ Inquire other peoples assumptions and conclusions
▪ Move down the ladder instead of up

Ladder of Inference - Bad Judgement
▪ Observe - Notice people in the first row
▪ Select - Person in front row keep looking at their phone
▪ Meaning - Not listening to my presentation
▪ Assumption - He is not interested
▪ Conclusion - Doesn’t like my new idea
▪ Beliefs - Their team always blocks new ideas
▪ Action - I send a nasty email to their boss

Ladder of Inference - Alternative Assumption
▪ Observe - I notice people in the first row
▪ Select - Person in the front row keep looking at their phone
▪ Meaning - Not listening to my presentation
▪ Assumption - Try and engage with a question (safely)
▪ Conclusion - Might find out that they are late for another meeting
and they really don’t want to miss this one… so they sent an
email noticing the next meeting team that they will be late….
▪ Beliefs - They are very excited about this new idea
▪ Action - Both teams setup another meeting to engage.

Lean
Safety Culture
Learning Organization
Psychology

▪ very Interesting research….
▪ Christina Maslach - Organizational Burnout
▪ Geri Puleo - Burnout (BDOC)
▪ Carol Dweck - Mindsets
▪ Kelly McGonigal - Stress
https://github.com/botchagalupe/my-presentations

▪ Anomaly Response
▪ Computers do not resolve outages.. people do
▪ Trade-off’s under pressure
▪ Cognition in the wild
▪ An outage is not a detective story
▪ With each step the story changes
▪ Need to see what’s happing with incomplete information
▪ Tools don’t always make thing better

▪ Anomaly Response - Internet Services are Opaque
▪ Network layer abstractions
▪ Variability in network performance
▪ Interdependent and decoupled services
▪ Internet based distributed computing
▪ Geographically distributed communication
▪ Open internet facing interactions

▪ Anomaly Response - Challenges
▪ Teamwork
▪ Communication
▪ Diagnosis
▪ Decision Making
▪ Coordination
▪ Improvisation
▪ Tooling

▪ Anomaly Response - Dynamic Fault Management
▪ Cascading effects
▪ Tempo changes and time pressure
▪ Multiple interleaved tasks
▪ Multiple interacting goals
▪ Need to revise assessments as new evidence comes in

"In dynamic fault management,
intervention precedes or is interwoven
with diagnosis"
- Woods (1994)

Source: (Woods) John Allspaw - http://bit.ly/AllspawThesis

Turning Human Capital into High Performance Organizational Capital

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Turning Human Capital into High Performance Organizational Capital

Similar to Turning Human Capital into High Performance Organizational Capital (20)

More from John Willis

More from John Willis (20)

Recently uploaded

Recently uploaded (20)

Turning Human Capital into High Performance Organizational Capital