SlideShare a Scribd company logo
1 of 153
Download to read offline
SysAdmin to SRE:
Solving the Last Mile Problem
Damon Edwards
@damonedwards
Operations:
The Last Mile
Operations:
The Last Mile
Silos Queues
Excessive ToilLow Trust
Operations:
The Last Mile
https://www.youtube.com/watch?v=1zUtBLZ4Lus
Silos Queues
Excessive ToilLow Trust
SRE
(Site Reliability Engineering)
“SRE…
When you ask
software engineers
to do operations”
“SRE…
Next-generation,
cloud-native
Operations”
Class SRE implements DevOps
“SRE…
When Ops does
more engineering
than Ops”
“SRE…
When you ask
software engineers
to do operations”
“SRE…
Next-generation,
cloud-native
Operations”
Class SRE implements DevOps
“SRE…
When Ops does
more engineering
than Ops”
SRE
Why SRE?
Simon Sinek
Start with
“why?”
Story time….
Its was just another Thursday…
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
What a c#@p
service!
I can’t login Barely works
It’s broken
Customers
Thursday 10:00am PDT
(1200 Agents)
t a c#@p
ervice!
rks Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
“Stuff
isn’t
working”
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Ops Ops
…but monitoring
is all green”
OK
OK
OK
OK
OK
Call Center
Agent
Customer
Now it works Now it works
Service
Desk
?
Ops Ops
3:30pm
The next day…
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Custo
VIP Cu
Friday 9:00am PDT
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Service
Desk
Escalate!
Incident
Commander
Ticket
Launch the
incident bridge
Ops
Incident
Commander
Ops
Dev
Sec
Ops
Bridge
Call
Ops
Not me…
Not me…
Not me…
Not me…
No code
updates
Probably not the new server
hardening process or the network
changes…
Headcount: 40
ev
No code
updates
Probably not the new server
dening process or the network
changes…
Ops
Ops
Ops
Uhh.. WHAT new
server hardening
process and network
changes?
Sec
We were going to fail
audit… you didn’t get
the email?
Dev
Bridge
Call
No code
updates
War
Room
SysAdmin
“Try
this”
Test
Platform
“Try
this”
Test
Network
“Try
this”
Test
Security
“Try
this”
Test
Storage
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander
“Theory: new
security updates”
Call Center
Agent
Customer
Now it works Now it works
Call Center
Manager
What is going
on?
3:30pm
Headcount: 30
orks
Ops
Ops
Sec
Ops
OpsOps
Rollback:
-OS changes
-Network changes
Over the weekend
QA
Headcount: 10
Monday morning…
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Custo
VIP Cus
Monday 10:00am PDT
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Customer Systems
Lead Dev
ding!
Ignore.
Incident
Commander
Hey did you s
that ticket?
Scrum
ustomer Systems
Lead Dev
Ignore.
Incident
Commander
Hey did you see
that ticket?
sigh.
I’ll take a look
Scrum
Customer Systems
Lead Dev
Customer S
Lead D
Somet
the data
.
I’ll take a look
r Systems
d Dev
Customer Systems
Lead Dev
Something is wrong with
the database connection…
… But our code didn’t
change.
DBA
No recent database
updates.
Dev
Bridge
Call
No code
updates
War
Room
DBA
“Try
this”
Test
SysAdmin
“Try
this”
Test
Network
“Try
this”
Test
Security
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander
“New Theory: Its
the database
connection”
Call Center
Manager
What is going
on?
idn’t
DBA
No recent database
updates.
Headcount: 20
Dev
Bridge
Call
No code
updates
War
Room
DBA
“Try
this”
Test
SysAdmin
“Try
this”
Test
Network
“Try
this”
Test
Security
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander
“New Theory: Its
the database
connection”
Call Center
Agent
Customer
Now it works Now it works
Call Center
Manager
What is going
on?
4:00pm
Headcount: 20
The next day…
Dev
Bridge
Call
No code
updates
War
Room
DBA
“Try
this”
Test
DBA
“Try
this”
Test
SysAdmin
“Try
this”
Test
SysEng
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander“New Theory: “problem with
stored procedures… but
not sure what”
Incident
Commander
DB Vendor phone
support isn’t
cutting it.
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
Tuesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Dev
No code
updates
War
Room
Test
Test
Test
Test
Test
Incident
Commander
Incident
Commander
Vendor
Management
DB Vendor phone
support isn’t
cutting it.
We only paid for
bronze support
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
Approval
Request
“Need to upgrade
support” Finance
??
The next day…
Dev
Bridge
Call
No code
updates
War
Room
Vendor
Consultant
“Let’s see with the vendor
consultant says”
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
OK, let me take a
look.
Ven
Cons
So
per
Wednesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Headcount: 15
Dev
e
No code
updates
War
Room
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
Vendor
Consultant
So?
Someone toggled on the new
performance analysis feature
DBA
3:00pm
dcount: 15
So?
Vendor
Consultant
Its been choking on a particular stored
procedure you use everywhere…
This stored procedure has
almost 400 parameters.
It’s 1 million lines
of code
but… its been
working for years!
?
?
?DBA
Dev
m
but… its been
working for years!
?
?
?
Ops
SysEng
QA
Ops
QA
DBA
change
config
load
test
Dev
1:00am
Headcount: 10
but… its been
working for years!
?
?
?
Ops
SysEng
QA
Ops
QA
DBA
change
config
load
test
Dev
1:00am
Headcount: 10
.
Post mortem…
Vendor
Consultant
Dir
Finance
No budget
GM, Line of
Business
Stay on
schedule
You should really
fix that…
Ops
It’s not fixed.
It’s just turned off.
VP Ops
I’m told bug
#8543 is P1, but
was rejected?
Ops
Refactor it before
it bites us again.
VP Dev
It’s not a bug.
You already have
a fix.
Dev
wins
Dev
wins
Dev
No time.
Dev
Their change
broke it.Dev vs Ops
Vendor
Consultant
Dir
Finance
No budget
GM, Line of
Business
Stay on
schedule
You should really
fix that…
Ops
It’s not fixed.
It’s just turned off.
VP Ops
I’m told bug
#8543 is P1, but
was rejected?
Ops
Refactor it before
it bites us again.
VP Dev
It’s not a bug.
You already have
a fix.
Dev
wins
Dev
wins
Dev
No time.
Dev
Their change
broke it.Dev vs Ops
Vendor
Consultant
Dir
Finance
No budget
GM, Line of
Business
Stay on
schedule
You should really
fix that…
Ops
It’s not fixed.
It’s just turned off.
VP Ops
I’m told bug
#8543 is P1, but
was rejected?
Ops
Refactor it before
it bites us again.
VP Dev
It’s not a bug.
You already have
a fix.
Dev
wins
Dev
wins
Dev
No time.
Dev
Their change
broke it.Dev vs Ops
Vendor
Consultant
Dir
Finance
No budget
GM, Line of
Business
Stay on
schedule
You should really
fix that…
Ops
It’s not fixed.
It’s just turned off.
VP Ops
I’m told bug
#8543 is P1, but
was rejected?
Ops
Refactor it before
it bites us again.
VP Dev
It’s not a bug.
You already have
a fix.
Dev
wins
Dev
wins
Dev
No time.
Dev
Their change
broke it.Dev vs Ops
Vendor
Consultant
Dir
Finance
No budget
GM, Line of
Business
Stay on
schedule
You should really
fix that…
Ops
It’s not fixed.
It’s just turned off.
VP Ops
I’m told bug
#8543 is P1, but
was rejected?
Ops
Refactor it before
it bites us again.
VP Dev
It’s not a bug.
You already have
a fix.
Dev
wins
Dev
wins
Dev
No time.
Dev
Their change
broke it.Dev vs Ops
Vendor
Consultant
Dir
Finance
No budget
GM, Line of
Business
Stay on
schedule
You should really
fix that…
Ops
It’s not fixed.
It’s just turned off.
VP Ops
I’m told bug
#8543 is P1, but
was rejected?
Ops
Refactor it before
it bites us again.
VP Dev
It’s not a bug.
You already have
a fix.
Dev
wins
Dev
wins
Dev
No time.
Dev
Their change
broke it.Dev vs Ops
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
What a c#@p
service!
I can’t login Barely works
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Call Center
Agent
Customer
Now it works Now it works
Service
Desk
?
Ops Ops
Thursday 10:00am PDT 3:30pm
(1200 Agents)
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Service
Desk
Escalate!
Incident
Commander
Ticket
Launch the
incident bridge
Ops
Incident
Commander
Ops
Dev
Sec
Ops
Bridge
Call
Ops
Not me…
Not me…
Not me…
Not me…
No code
updates
Probably not the new server
hardening process or the network
changes…
Ops
Ops
Ops
Uhh.. WHAT new
server hardening
process and network
changes?
Sec
We were going to fail
audit… you didn’t get
the email?
Dev
Bridge
Call
No code
updates
War
Room
SysAdmin
“Try
this”
Test
Platform
“Try
this”
Test
Network
“Try
this”
Test
Security
“Try
this”
Test
Storage
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander
“Theory: new
security updates”
Call Center
Agent
Customer
Now it works Now it works
Ops
Ops
Sec
Ops
Ops
Call Center
Manager
What is going
on?
Ops
Rollback:
-OS changes
-Network changes
3:30pm Over the weekend
QA
Headcount: 40
Headcount: 30
Headcount: 10
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Bridge
Call
DBA
“Try
this”
SysAdmin
“Try
this”
Network
“Try
this”
Security
“Try
this”
SysEng
“Try
this”
“New Theory: Its
the database
connection”
Customer Systems
Lead Dev
ding!
Ignore.
Incident
Commander
Hey did you see
that ticket?
sigh.
I’ll take a look
Scrum
Customer Systems
Lead Dev
Customer Systems
Lead Dev
Something is wrong with
the database connection…
… But our code didn’t
change.
DBA
No recent database
updates.
Monday 10:00am PDT
Headco
Dev
Bridge
Call
No code
updates
War
Room
DBA
“Try
this”
Test
DBA
“Try
this”
Test
SysAdmin
“Try
this”
Test
SysEng
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander“New Theory: “problem with
stored procedures… but
not sure what”
Incident
Commander
Vendor
Management
DB Vendor phone
support isn’t
cutting it.
We only paid for
bronze support
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
Approval
Request
“Need to upgrade
support” Finance
??
Tuesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Dev
Bridge
Call
No code
updates
War
Room
Vendor
Consultant
“Let’s see with the vendor
consultant says”
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
OK, let me take a
look.
Vendor
Consultant
So?
Vendor
Consultant
Its been choking on a particular stored
procedure you use everywhere…Someone toggled on the new
performance analysis feature
This stored procedure has
almost 400 parameters.
It’s 1 million lines
of code
but… its been
working for years!
?
?
?
Ops
Sys
Ops
QA
change
config
load
test
Wednesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
DBA
Dev
3:00pm
Headcount: 15
Headcount: 10
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
Friday 9:00am PDT
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
What a c#@p
service!
I can’t login Barely works
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Call Center
Agent
Customer
Now it works Now it works
Service
Desk
?
Ops Ops
Thursday 10:00am PDT 3:30pm
(1200 Agents)
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Service
Desk
Escalate!
Incident
Commander
Ticket
Launch the
incident bridge
Ops
Incident
Commander
Ops
Dev
Sec
Ops
Bridge
Call
Ops
Not me…
Not me…
Not me…
Not me…
No code
updates
Probably not the new server
hardening process or the network
changes…
Ops
Ops
Ops
Uhh.. WHAT new
server hardening
process and network
changes?
Sec
We were going to fail
audit… you didn’t get
the email?
Dev
Bridge
Call
No code
updates
War
Room
SysAdmin
“Try
this”
Test
Platform
“Try
this”
Test
Network
“Try
this”
Test
Security
“Try
this”
Test
Storage
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander
“Theory: new
security updates”
Call Center
Agent
Customer
Now it works Now it works
Ops
Ops
Sec
Ops
Ops
Call Center
Manager
What is going
on?
Ops
Rollback:
-OS changes
-Network changes
3:30pm Over the weekend
QA
Headcount: 40
Headcount: 30
Headcount: 10
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Bridge
Call
DBA
“Try
this”
SysAdmin
“Try
this”
Network
“Try
this”
Security
“Try
this”
SysEng
“Try
this”
“New Theory: Its
the database
connection”
Customer Systems
Lead Dev
ding!
Ignore.
Incident
Commander
Hey did you see
that ticket?
sigh.
I’ll take a look
Scrum
Customer Systems
Lead Dev
Customer Systems
Lead Dev
Something is wrong with
the database connection…
… But our code didn’t
change.
DBA
No recent database
updates.
Monday 10:00am PDT
Headco
Dev
Bridge
Call
No code
updates
War
Room
DBA
“Try
this”
Test
DBA
“Try
this”
Test
SysAdmin
“Try
this”
Test
SysEng
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander“New Theory: “problem with
stored procedures… but
not sure what”
Incident
Commander
Vendor
Management
DB Vendor phone
support isn’t
cutting it.
We only paid for
bronze support
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
Approval
Request
“Need to upgrade
support” Finance
??
Tuesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Dev
Bridge
Call
No code
updates
War
Room
Vendor
Consultant
“Let’s see with the vendor
consultant says”
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
OK, let me take a
look.
Vendor
Consultant
So?
Vendor
Consultant
Its been choking on a particular stored
procedure you use everywhere…Someone toggled on the new
performance analysis feature
This stored procedure has
almost 400 parameters.
It’s 1 million lines
of code
but… its been
working for years!
?
?
?
Ops
Sys
Ops
QA
change
config
load
test
Wednesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
DBA
Dev
3:00pm
Headcount: 15
Headcount: 10
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
Friday 9:00am PDT
Response labor: $270,000
Lost call center productivity: $620,000
$890,000
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
What a c#@p
service!
I can’t login Barely works
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Call Center
Agent
Customer
Now it works Now it works
Service
Desk
?
Ops Ops
Thursday 10:00am PDT 3:30pm
(1200 Agents)
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Service
Desk
Escalate!
Incident
Commander
Ticket
Launch the
incident bridge
Ops
Incident
Commander
Ops
Dev
Sec
Ops
Bridge
Call
Ops
Not me…
Not me…
Not me…
Not me…
No code
updates
Probably not the new server
hardening process or the network
changes…
Ops
Ops
Ops
Uhh.. WHAT new
server hardening
process and network
changes?
Sec
We were going to fail
audit… you didn’t get
the email?
Dev
Bridge
Call
No code
updates
War
Room
SysAdmin
“Try
this”
Test
Platform
“Try
this”
Test
Network
“Try
this”
Test
Security
“Try
this”
Test
Storage
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander
“Theory: new
security updates”
Call Center
Agent
Customer
Now it works Now it works
Ops
Ops
Sec
Ops
Ops
Call Center
Manager
What is going
on?
Ops
Rollback:
-OS changes
-Network changes
3:30pm Over the weekend
QA
Headcount: 40
Headcount: 30
Headcount: 10
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Bridge
Call
DBA
“Try
this”
SysAdmin
“Try
this”
Network
“Try
this”
Security
“Try
this”
SysEng
“Try
this”
“New Theory: Its
the database
connection”
Customer Systems
Lead Dev
ding!
Ignore.
Incident
Commander
Hey did you see
that ticket?
sigh.
I’ll take a look
Scrum
Customer Systems
Lead Dev
Customer Systems
Lead Dev
Something is wrong with
the database connection…
… But our code didn’t
change.
DBA
No recent database
updates.
Monday 10:00am PDT
Headco
Dev
Bridge
Call
No code
updates
War
Room
DBA
“Try
this”
Test
DBA
“Try
this”
Test
SysAdmin
“Try
this”
Test
SysEng
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander“New Theory: “problem with
stored procedures… but
not sure what”
Incident
Commander
Vendor
Management
DB Vendor phone
support isn’t
cutting it.
We only paid for
bronze support
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
Approval
Request
“Need to upgrade
support” Finance
??
Tuesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Dev
Bridge
Call
No code
updates
War
Room
Vendor
Consultant
“Let’s see with the vendor
consultant says”
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
OK, let me take a
look.
Vendor
Consultant
So?
Vendor
Consultant
Its been choking on a particular stored
procedure you use everywhere…Someone toggled on the new
performance analysis feature
This stored procedure has
almost 400 parameters.
It’s 1 million lines
of code
but… its been
working for years!
?
?
?
Ops
Sys
Ops
QA
change
config
load
test
Wednesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
DBA
Dev
3:00pm
Headcount: 15
Headcount: 10
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
Friday 9:00am PDT
Response labor: $270,000
Lost call center productivity: $620,000
$890,000
(+ project delays)
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
What a c#@p
service!
I can’t login Barely works
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Call Center
Agent
Customer
Now it works Now it works
Service
Desk
?
Ops Ops
Thursday 10:00am PDT 3:30pm
(1200 Agents)
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Service
Desk
Escalate!
Incident
Commander
Ticket
Launch the
incident bridge
Ops
Incident
Commander
Ops
Dev
Sec
Ops
Bridge
Call
Ops
Not me…
Not me…
Not me…
Not me…
No code
updates
Probably not the new server
hardening process or the network
changes…
Ops
Ops
Ops
Uhh.. WHAT new
server hardening
process and network
changes?
Sec
We were going to fail
audit… you didn’t get
the email?
Dev
Bridge
Call
No code
updates
War
Room
SysAdmin
“Try
this”
Test
Platform
“Try
this”
Test
Network
“Try
this”
Test
Security
“Try
this”
Test
Storage
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander
“Theory: new
security updates”
Call Center
Agent
Customer
Now it works Now it works
Ops
Ops
Sec
Ops
Ops
Call Center
Manager
What is going
on?
Ops
Rollback:
-OS changes
-Network changes
3:30pm Over the weekend
QA
Headcount: 40
Headcount: 30
Headcount: 10
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Bridge
Call
DBA
“Try
this”
SysAdmin
“Try
this”
Network
“Try
this”
Security
“Try
this”
SysEng
“Try
this”
“New Theory: Its
the database
connection”
Customer Systems
Lead Dev
ding!
Ignore.
Incident
Commander
Hey did you see
that ticket?
sigh.
I’ll take a look
Scrum
Customer Systems
Lead Dev
Customer Systems
Lead Dev
Something is wrong with
the database connection…
… But our code didn’t
change.
DBA
No recent database
updates.
Monday 10:00am PDT
Headco
Dev
Bridge
Call
No code
updates
War
Room
DBA
“Try
this”
Test
DBA
“Try
this”
Test
SysAdmin
“Try
this”
Test
SysEng
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander“New Theory: “problem with
stored procedures… but
not sure what”
Incident
Commander
Vendor
Management
DB Vendor phone
support isn’t
cutting it.
We only paid for
bronze support
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
Approval
Request
“Need to upgrade
support” Finance
??
Tuesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Dev
Bridge
Call
No code
updates
War
Room
Vendor
Consultant
“Let’s see with the vendor
consultant says”
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
OK, let me take a
look.
Vendor
Consultant
So?
Vendor
Consultant
Its been choking on a particular stored
procedure you use everywhere…Someone toggled on the new
performance analysis feature
This stored procedure has
almost 400 parameters.
It’s 1 million lines
of code
but… its been
working for years!
?
?
?
Ops
Sys
Ops
QA
change
config
load
test
Wednesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
DBA
Dev
3:00pm
Headcount: 15
Headcount: 10
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
Friday 9:00am PDT
Response labor: $270,000
Lost call center productivity: $620,000
$890,000
(+ project delays)
(+ brand damage)
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
What a c#@p
service!
I can’t login Barely works
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Call Center
Agent
Customer
Now it works Now it works
Service
Desk
?
Ops Ops
Thursday 10:00am PDT 3:30pm
(1200 Agents)
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Service
Desk
Escalate!
Incident
Commander
Ticket
Launch the
incident bridge
Ops
Incident
Commander
Ops
Dev
Sec
Ops
Bridge
Call
Ops
Not me…
Not me…
Not me…
Not me…
No code
updates
Probably not the new server
hardening process or the network
changes…
Ops
Ops
Ops
Uhh.. WHAT new
server hardening
process and network
changes?
Sec
We were going to fail
audit… you didn’t get
the email?
Dev
Bridge
Call
No code
updates
War
Room
SysAdmin
“Try
this”
Test
Platform
“Try
this”
Test
Network
“Try
this”
Test
Security
“Try
this”
Test
Storage
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander
“Theory: new
security updates”
Call Center
Agent
Customer
Now it works Now it works
Ops
Ops
Sec
Ops
Ops
Call Center
Manager
What is going
on?
Ops
Rollback:
-OS changes
-Network changes
3:30pm Over the weekend
QA
Headcount: 40
Headcount: 30
Headcount: 10
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
“…but monitoring
is all green”
Service
Desk
OK
OK
OK
OK
OK
Bridge
Call
DBA
“Try
this”
SysAdmin
“Try
this”
Network
“Try
this”
Security
“Try
this”
SysEng
“Try
this”
“New Theory: Its
the database
connection”
Customer Systems
Lead Dev
ding!
Ignore.
Incident
Commander
Hey did you see
that ticket?
sigh.
I’ll take a look
Scrum
Customer Systems
Lead Dev
Customer Systems
Lead Dev
Something is wrong with
the database connection…
… But our code didn’t
change.
DBA
No recent database
updates.
Monday 10:00am PDT
Headco
Dev
Bridge
Call
No code
updates
War
Room
DBA
“Try
this”
Test
DBA
“Try
this”
Test
SysAdmin
“Try
this”
Test
SysEng
“Try
this”
Test
SysEng
“Try
this”
Test
Incident
Commander“New Theory: “problem with
stored procedures… but
not sure what”
Incident
Commander
Vendor
Management
DB Vendor phone
support isn’t
cutting it.
We only paid for
bronze support
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
Approval
Request
“Need to upgrade
support” Finance
??
Tuesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Dev
Bridge
Call
No code
updates
War
Room
Vendor
Consultant
“Let’s see with the vendor
consultant says”
Call Center
Manager
What is going
on?
Call Center
Director
What is being
done?
OK, let me take a
look.
Vendor
Consultant
So?
Vendor
Consultant
Its been choking on a particular stored
procedure you use everywhere…Someone toggled on the new
performance analysis feature
This stored procedure has
almost 400 parameters.
It’s 1 million lines
of code
but… its been
working for years!
?
?
?
Ops
Sys
Ops
QA
change
config
load
test
Wednesday 10:00am PDT
Call Center
Agent
Call Center
Agent
… so frustrating
Not again…
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
DBA
Dev
3:00pm
Headcount: 15
Headcount: 10
Call Center
Agent
Call Center
Agent
My browser
times out!Wow, this is
so slow!
I can’t login
Are you kidding
me?
How hard is it to
run a website? Soo Sloooow
It’s broken
Customers
Call Center
Agent
Technical
Support
Service
Desk
Many tickets
Many calls
Customers
“Stuff
isn’t
working”
VIP Customers
Friday 9:00am PDT
Response labor: $270,000
Lost call center productivity: $620,000
$890,000
(+ project delays)
(+ brand damage)
> $1,000,000
How did they end up here?
Corporate Plan
Annual Budget
Project Plan
Requirements
Corporate Plan
Annual Budget
Project Plan
Requirements
Corporate Plan
Annual Budget
Project Plan
Requirements
Corporate Plan
Annual Budget
Project Plan
Requirements
Context
Context
Process
Process
Tooling
Tooling
Capacity
Capacity
What were they thinking?
26 ITIL Processes
Service Validation & Testing
Strategy Management for IT Services
Supplier Management
The 7 Step Improvement
Transition Planning & Support
Access Management
Availability Management
Business Relationship Management
Capacity Management
Change Management
Change Evaluation
Demand Management
Design Coordination
Event Management
Financial Management for IT Services
Incident Management
Information Security Management
IT Service Continuity Management
Knowledge Management Process
Problem Management Process
Release & Deployment Management
Request Fulfillment Process
Service Asset & Configuration Management
Service Catalog Management
Service Level Management
Service Portfolio Management
ITIL Processes
The same as everyone else.
26 ITIL Processes
Service Validation & Testing
Strategy Management for IT Services
Supplier Management
The 7 Step Improvement
Transition Planning & Support
Access Management
Availability Management
Business Relationship Management
Capacity Management
Change Management
Change Evaluation
Demand Management
Design Coordination
Event Management
Financial Management for IT Services
Incident Management
Information Security Management
IT Service Continuity Management
Knowledge Management Process
Problem Management Process
Release & Deployment Management
Request Fulfillment Process
Service Asset & Configuration Management
Service Catalog Management
Service Level Management
Service Portfolio Management
26 ITIL Processes
Service Validation & Testing
Strategy Management for IT Services
Supplier Management
The 7 Step Improvement
Transition Planning & Support
Access Management
Availability Management
Business Relationship Management
Capacity Management
Change Management
Change Evaluation
Demand Management
Design Coordination
Event Management
Financial Management for IT Services
Incident Management
Information Security Management
IT Service Continuity Management
Knowledge Management Process
Problem Management Process
Release & Deployment Management
Request Fulfillment Process
Service Asset & Configuration Management
Service Catalog Management
Service Level Management
Service Portfolio Management
26 ITIL Processes
Service Validation & Testing
Strategy Management for IT Services
Supplier Management
The 7 Step Improvement
Transition Planning & Support
Access Management
Availability Management
Business Relationship Management
Capacity Management
Change Management
Change Evaluation
Demand Management
Design Coordination
Event Management
Financial Management for IT Services
Incident Management
Information Security Management
IT Service Continuity Management
Knowledge Management Process
Problem Management Process
Release & Deployment Management
Request Fulfillment Process
Service Asset & Configuration Management
Service Catalog Management
Service Level Management
Service Portfolio Management
26 ITIL Processes
Service Validation & Testing
Strategy Management for IT Services
Supplier Management
The 7 Step Improvement
Transition Planning & Support
Access Management
Availability Management
Business Relationship Management
Capacity Management
Change Management
Change Evaluation
Demand Management
Design Coordination
Event Management
Financial Management for IT Services
Incident Management
Information Security Management
IT Service Continuity Management
Knowledge Management Process
Problem Management Process
Release & Deployment Management
Request Fulfillment Process
Service Asset & Configuration Management
Service Catalog Management
Service Level Management
Service Portfolio Management
26 ITIL Processes
Service Validation & Testing
Strategy Management for IT Services
Supplier Management
The 7 Step Improvement
Transition Planning & Support
Access Management
Availability Management
Business Relationship Management
Capacity Management
Change Management
Change Evaluation
Demand Management
Design Coordination
Event Management
Financial Management for IT Services
Incident Management
Information Security Management
IT Service Continuity Management
Knowledge Management Process
Problem Management Process
Release & Deployment Management
Request Fulfillment Process
Service Asset & Configuration Management
Service Catalog Management
Service Level Management
Service Portfolio Management
Encourages
Silos
Context
Context
Process
Process
Tooling
Tooling
Capacity
Capacity
26 ITIL Processes
Service Validation & Testing
Strategy Management for IT Services
Supplier Management
The 7 Step Improvement
Transition Planning & Support
Access Management
Availability Management
Business Relationship Management
Capacity Management
Change Management
Change Evaluation
Demand Management
Design Coordination
Event Management
Financial Management for IT Services
Incident Management
Information Security Management
IT Service Continuity Management
Knowledge Management Process
Problem Management Process
Release & Deployment Management
Request Fulfillment Process
Service Asset & Configuration Management
Service Catalog Management
Service Level Management
Service Portfolio Management
Encourages
Silos
Context
Context
Process
Process
Tooling
Tooling
Capacity
Capacity
Command and Control Management
26 ITIL Processes
Service Validation & Testing
Strategy Management for IT Services
Supplier Management
The 7 Step Improvement
Transition Planning & Support
Access Management
Availability Management
Business Relationship Management
Capacity Management
Change Management
Change Evaluation
Demand Management
Design Coordination
Event Management
Financial Management for IT Services
Incident Management
Information Security Management
IT Service Continuity Management
Knowledge Management Process
Problem Management Process
Release & Deployment Management
Request Fulfillment Process
Service Asset & Configuration Management
Service Catalog Management
Service Level Management
Service Portfolio Management
Encourages
Silos
Context
Context
Process
Process
Tooling
Tooling
Capacity
Capacity
Command and Control Management
Deming
“3. Cease dependence on
inspection to achieve
quality.”
26 ITIL Processes
Service Validation & Testing
Strategy Management for IT Services
Supplier Management
The 7 Step Improvement
Transition Planning & Support
Access Management
Availability Management
Business Relationship Management
Capacity Management
Change Management
Change Evaluation
Demand Management
Design Coordination
Event Management
Financial Management for IT Services
Incident Management
Information Security Management
IT Service Continuity Management
Knowledge Management Process
Problem Management Process
Release & Deployment Management
Request Fulfillment Process
Service Asset & Configuration Management
Service Catalog Management
Service Level Management
Service Portfolio Management
Encourages
Silos
Context
Context
Process
Process
Tooling
Tooling
Capacity
Capacity
Command and Control Management
Deming
“3. Cease dependence on
inspection to achieve
quality.”
Charity Majors
“Distributed systems have an
infinite list of almost impossible
failure scenarios”
26 ITIL Processes
Service Validation & Testing
Strategy Management for IT Services
Supplier Management
The 7 Step Improvement
Transition Planning & Support
Access Management
Availability Management
Business Relationship Management
Capacity Management
Change Management
Change Evaluation
Demand Management
Design Coordination
Event Management
Financial Management for IT Services
Incident Management
Information Security Management
IT Service Continuity Management
Knowledge Management Process
Problem Management Process
Release & Deployment Management
Request Fulfillment Process
Service Asset & Configuration Management
Service Catalog Management
Service Level Management
Service Portfolio Management
Encourages
Silos
Context
Context
Process
Process
Tooling
Tooling
Capacity
Capacity
Command and Control Management
Deming
“3. Cease dependence on
inspection to achieve
quality.”
X X X X X X
Charity Majors
“Distributed systems have an
infinite list of almost impossible
failure scenarios”
Is there a different way?
The Rise of a New IT Operations
Support Model
By 2015, DevOps will evolve from a niche strategy employed
by large cloud providers into a mainstream strategy employed
by 20% of Global 2000 organizations
Why DevOps will emerge:
!DevOps is not usually driven from
Why DevOps will not emerge:
!Cultural changes are the hardest to
by 20% of Global 2000 organizations.
!DevOps is not usually driven from
the top down and, thus, may be
more easily accepted by IT
operations teams.
!Cultural changes are the hardest to
implement, and DevOps requires a
significant rethinking of IT
operations conventional wisdom.
!ITIL and other best practices
frameworks are acknowledged to
have not delivered on their goals,
enabling IT organizations to look for
!There is a large body of work with
respect to ITIL and other best
practices frameworks that is already
accepted within the industry enabling IT organizations to look for
new models.
!The growing interest in tools such
as Chef, Puppet, etc., will help
accepted within the industry.
!Open source (OSS) management
tools, which are more aligned with
this approach, have not seen pp p
stimulate demand for OSS-based
management
pp
significant enterprise market share
traction.
March 18, 2011
Cameron Haight
DevOps vs
ITIL?
The Rise of a New IT Operations
Support Model
By 2015, DevOps will evolve from a niche strategy employed
by large cloud providers into a mainstream strategy employed
by 20% of Global 2000 organizations
Why DevOps will emerge:
!DevOps is not usually driven from
Why DevOps will not emerge:
!Cultural changes are the hardest to
by 20% of Global 2000 organizations.
!DevOps is not usually driven from
the top down and, thus, may be
more easily accepted by IT
operations teams.
!Cultural changes are the hardest to
implement, and DevOps requires a
significant rethinking of IT
operations conventional wisdom.
!ITIL and other best practices
frameworks are acknowledged to
have not delivered on their goals,
enabling IT organizations to look for
!There is a large body of work with
respect to ITIL and other best
practices frameworks that is already
accepted within the industry enabling IT organizations to look for
new models.
!The growing interest in tools such
as Chef, Puppet, etc., will help
accepted within the industry.
!Open source (OSS) management
tools, which are more aligned with
this approach, have not seen pp p
stimulate demand for OSS-based
management
pp
significant enterprise market share
traction.
March 18, 2011
Cameron Haight
DevOps vs
ITIL?
The Rise of a New IT Operations
Support Model
By 2015, DevOps will evolve from a niche strategy employed
by large cloud providers into a mainstream strategy employed
by 20% of Global 2000 organizations
Why DevOps will emerge:
!DevOps is not usually driven from
Why DevOps will not emerge:
!Cultural changes are the hardest to
by 20% of Global 2000 organizations.
!DevOps is not usually driven from
the top down and, thus, may be
more easily accepted by IT
operations teams.
!Cultural changes are the hardest to
implement, and DevOps requires a
significant rethinking of IT
operations conventional wisdom.
!ITIL and other best practices
frameworks are acknowledged to
have not delivered on their goals,
enabling IT organizations to look for
!There is a large body of work with
respect to ITIL and other best
practices frameworks that is already
accepted within the industry enabling IT organizations to look for
new models.
!The growing interest in tools such
as Chef, Puppet, etc., will help
accepted within the industry.
!Open source (OSS) management
tools, which are more aligned with
this approach, have not seen pp p
stimulate demand for OSS-based
management
pp
significant enterprise market share
traction.
March 18, 2011
Cameron Haight
DevOps vs
ITIL?
Product,
Not Project
Continuous
Delivery
Shift
Left
(and more!)
DevOps…
Error
Budgets
0
100
!!
Toil
Limits
Cloud
Native
(and more!)
…then comes SRE
Product,
Not Project
Continuous
Delivery
Shift
Left
(and more!)
DevOps…
Error
Budgets
0
100
!!
Toil
Limits
Cloud
Native
(and more!)
…then comes SRE
Product,
Not Project
Continuous
Delivery
Shift
Left
Error
Budgets
0
100
!!
Toil
Limits
Cloud
Native+ + + + +
Product,
Not Project
Continuous
Delivery
Shift
Left
Error
Budgets
0
100
!!
Toil
Limits
Cloud
Native+ + + + +
“Value-Aligned” and Self-Regulating
Product,
Not Project
Continuous
Delivery
Shift
Left
Error
Budgets
0
100
!!
Toil
Limits
Cloud
Native+ + + + +
“Value-Aligned” and Self-Regulating
Dev Ops
Cross-Functional Team
Cross-Functional Team
Product,
Not Project
Continuous
Delivery
Shift
Left
Error
Budgets
0
100
!!
Toil
Limits
Cloud
Native+ + + + +
“Value-Aligned” and Self-Regulating
Dev Ops
Cross-Functional Team
Cross-Functional Team
Shared
Responsibility
Model
Product,
Not Project
Continuous
Delivery
Shift
Left
Error
Budgets
0
100
!!
Toil
Limits
Cloud
Native+ + + + +
“Value-Aligned” and Self-Regulating
Dev Ops
Cross-Functional Team
Cross-Functional Team
Shared
Responsibility
Model
“DevOps is a
deconstructive
movement”
Jon Hall
Developer
Developer
Developer
Developer
Developer
Old Release Still
Running
Release Plan
Release Plan
Release Plan
Release Plan
Deploy
Feature to
Production
Deploy
Feature to
Production
Deploy
Feature to
Production
Deploy
Feature to
Production
Bugs
Deploy
Feature to
Production
Immutable microservice deployment
scales, is faster with large teams and
diverse platform components
Adrian Cockcroft
https://www.youtube.com/watch?v=nMTaS07i3jk
DockerCon EU 2014
Architecture enables
speed.
Speed is the advantage.
Developer
Developer
Developer
Developer
Developer
Old Release Still
Running
Release Plan
Release Plan
Release Plan
Release Plan
Deploy
Feature to
Production
Deploy
Feature to
Production
Deploy
Feature to
Production
Deploy
Feature to
Production
Bugs
Deploy
Feature to
Production
Immutable microservice deployment
scales, is faster with large teams and
diverse platform components
Adrian Cockcroft
https://www.youtube.com/watch?v=nMTaS07i3jk
DockerCon EU 2014
Architecture enables
speed.
Speed is the advantage.
Developer
Developer
Developer
Developer
Developer
Old Release Still
Running
Release Plan
Release Plan
Release Plan
Release Plan
Deploy
Feature to
Production
Deploy
Feature to
Production
Deploy
Feature to
Production
Deploy
Feature to
Production
Bugs
Deploy
Feature to
Production
Immutable microservice deployment
scales, is faster with large teams and
diverse platform components
Adrian Cockcroft
https://www.youtube.com/watch?v=nMTaS07i3jk
DockerCon EU 2014
Architecture enables
speed.
Speed is the advantage.
Keeps the people out of
their own way!
What is the innovation of SRE?
Principles are what makes SRE different
Principles are what makes SRE different
Stephen Thorne, Google

At DevOps Enterprise Summit

London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
Principles are what makes SRE different
1. SRE needs Service Level Objectives, with consequences
Stephen Thorne, Google

At DevOps Enterprise Summit

London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
DEV
BIZ
Ops
SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
DEV
BIZ
Ops
SLO takes priority!!
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

Stephen Thorne, Google

At DevOps Enterprise Summit

London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
Stephen Thorne, Google

At DevOps Enterprise Summit

London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
Toil: Name For a Problem We’ve All Felt
Toil: Name For a Problem We’ve All Felt
“Toil is the kind of work tied to running a production
service that tends to be manual, repetitive,
automatable, tactical, devoid of enduring value, and
that scales linearly as a service grows.”
-Vivek Rau

Google
Toil vs. Engineering Work
Toil Engineering Work
Lacks Enduring Value Builds Enduring Value
Rote, Repetitive Creative, Iterative
Tactical Strategic
Increases With Scale Enables Scaling
Can Be Automated Requires Human Creativity
Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Downward spiral is inevitable!
Toil is a Naturally Occurring Force
General Evolution of Automation
1. No automation
2. Externally maintained system-specific automation
3. Externally maintained generic automation
4. Internally maintained system-specific automation
5. Systems that don’t need any automation
Niall Murphy
Microsoft Azure
Toil is a Naturally Occurring Force
General Evolution of Automation
1. No automation
2. Externally maintained system-specific automation
3. Externally maintained generic automation
4. Internally maintained system-specific automation
5. Systems that don’t need any automation
Niall Murphy
Microsoft Azure
Launch
(ToDos & Unknowns)
Mature
Toil is a Naturally Occurring Force
General Evolution of Automation
1. No automation
2. Externally maintained system-specific automation
3. Externally maintained generic automation
4. Internally maintained system-specific automation
5. Systems that don’t need any automation
Niall Murphy
Microsoft Azure
Toil
Toil
Toil
Toil
Launch
(ToDos & Unknowns)
Mature
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
Stephen Thorne, Google

At DevOps Enterprise Summit

London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Stephen Thorne, Google

At DevOps Enterprise Summit

London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
SRE teams have the ability to regulate their workload
SRE teams have the ability to regulate their workload
SRE can say no.
SRE teams have the ability to regulate their workload
Example:
SRE can say no.
SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
SRE can say no.
SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
(separate the “running in production” from “run by SRE/Ops”)
SRE can say no.
Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
What's the Difference Between DevOps and SRE? 

(class SRE implements DevOps)
@sethvargo@lizthegrey
Where to start (the practical approach)
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Reduce toil.

Everybody wins!
Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences

2. SREs have time to make tomorrow better than today

3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Reduce toil.

Everybody wins!
Why focus on reducing toil?
Why focus on reducing toil?
1. Lots of value independent of “SRE”
2. Your people are you most expensive assets

… stay out of their way!
Why focus on reducing toil?
1. Lots of value independent of “SRE”
Start reducing toil today
Toil
Start reducing toil today
1. Track toil levels for each team
Toil
Start reducing toil today
1. Track toil levels for each team
Toil
Track toil levels for each team
Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
• Track

• Self-reporting

• Periodic surveys

• SM or PM interview/sampling
Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
• Track

• Self-reporting

• Periodic surveys

• SM or PM interview/sampling
• Don’t get lost in time tracking weeds!
Start reducing toil today
1. Track toil levels for each team
Toil
Start reducing toil today
1. Track toil levels for each team
Toil
2. Set toil limit for each team (50% is conventional wisdom)
Start reducing toil today
1. Track toil levels for each team

2. Set toil limit for each team (50% is conventional wisdom)

3. Fund efforts to reduce toil (with emphasis on teams already over limit)
Toil
Start reducing toil today
1. Track toil levels for each team

2. Set toil limit for each team (50% is conventional wisdom)

3. Fund efforts to reduce toil (with emphasis on teams already over limit)
Toil
Michael Kehoe

Todd Palino 

(LinkedIn)

At SREcon Americas 2019

Example
Process
“Code Yellow”
Where to focus?
Toil
Where to focus?
Toil
Reduce
Technical Debt
Where to focus?
Toil
Reduce
Technical Debt
Re-Engineer

Processes
Where to focus?
Toil
Reduce
Technical Debt
Re-Engineer

Processes
Enable
Self-Service
Where to focus?
Toil
Reduce
Technical Debt
Re-Engineer

Processes
Enable
Self-Service
Eliminate Interruptions
Eliminate Waiting
Eliminate Interruptions
Eliminate Waiting
Self-Service
(runbooks)
Do X.
Eliminate Interruptions
Eliminate Waiting
Self-Service
(runbooks)
Do X.
… and a lot less toil
Empower teams to spot and fix the anti-patterns.
“Fix this for me, fix it again, then fix it again.”
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
“Fix this for me, fix it again, then fix it again.”
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
“I could fix it, but I can’t get to it.”
Environment
I could fix it if I
could get to it
Before
Wait
Interrupt
“I could fix it, but I can’t get to it.”
Environment
I could fix it if I
could get to it
Before
Wait
Interrupt
After
I’ve got this!
Environment
Self-
Service
“The dog-pile.”
!!
I think its a problem with
db07-store2.uswest.acme
“$ top”
“$ top”
db07store2.
uswest.acme
“$ top”
“$ top”
“$ top”
!!
“$ top”
!!
!!
!!
healthcheck
store2 -all
db07store2.
uswest.acme
Self-Service
1.
2.
3.
I think its a problem with
db07-store2.uswest.acme
“I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
“I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
“I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart”
Environment
Later…
Update
Restart Job
✅
I’ve done this before.
I’ve got this.
Self-Service
Self-Service
After
“Known issue… doesn’t get permanent fix”
“Known issue… doesn’t get permanent fix”
Recap: Make Tomorrow Better Than Today
Beware: impact of traditional
management structures
Be practical and start focusing
on toil
Find and fix toil anti-patterns Empower with Self-Service
Runbooks
SRE is a new way to think
about Ops work
1. SRE needs Service Level
Objectives, with consequences

2. SREs have time to make
tomorrow better than today

3. SRE teams have the ability to
regulate their workload
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
Toil
Use DevOps and SRE to improve
speed and quality
After
I’ve got this!
Environment
Self-
Service
Let’s talk…
@damonedwards
damon@rundeck.com

More Related Content

What's hot

Operations: The Last Mile
Operations: The Last Mile Operations: The Last Mile
Operations: The Last Mile Rundeck
 
SRE Lessons for the Enterprise
SRE Lessons for the Enterprise SRE Lessons for the Enterprise
SRE Lessons for the Enterprise Rundeck
 
Modern Operations: Solving DevOps’ Last Mile Problem
Modern Operations: Solving DevOps’ Last Mile Problem Modern Operations: Solving DevOps’ Last Mile Problem
Modern Operations: Solving DevOps’ Last Mile Problem Rundeck
 
The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management Rundeck
 
Failure Happens: Improving Incident Response In Enterprises
Failure Happens: Improving Incident Response In Enterprises Failure Happens: Improving Incident Response In Enterprises
Failure Happens: Improving Incident Response In Enterprises Rundeck
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Rundeck
 
Operations as a Service: Because Failure Still Happens
Operations as a Service: Because Failure Still Happens Operations as a Service: Because Failure Still Happens
Operations as a Service: Because Failure Still Happens Rundeck
 
The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps Rundeck
 
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Rundeck
 
Operations: The Last Mile Problem For DevOps
Operations: The Last Mile Problem For DevOpsOperations: The Last Mile Problem For DevOps
Operations: The Last Mile Problem For DevOpsRundeck
 
Self-Service Operations: Because Failure Still Happens (Developer Edition)
Self-Service Operations: Because Failure Still Happens (Developer Edition)Self-Service Operations: Because Failure Still Happens (Developer Edition)
Self-Service Operations: Because Failure Still Happens (Developer Edition)Rundeck
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationRundeck
 
Self-Service Operations: Because Ops Still Happens
Self-Service Operations: Because Ops Still HappensSelf-Service Operations: Because Ops Still Happens
Self-Service Operations: Because Ops Still HappensRundeck
 
Agile Infrastructure - Agile 2009
Agile Infrastructure - Agile 2009Agile Infrastructure - Agile 2009
Agile Infrastructure - Agile 2009Andrew Shafer
 
My History with Atlassian Tools, and Why I'm Moving to Studio
My History with Atlassian Tools, and Why I'm Moving to StudioMy History with Atlassian Tools, and Why I'm Moving to Studio
My History with Atlassian Tools, and Why I'm Moving to StudioAtlassian
 
Teaching Elephants to Dance (and Fly!) A Developer's Journey to Digital Trans...
Teaching Elephants to Dance (and Fly!) A Developer's Journey to Digital Trans...Teaching Elephants to Dance (and Fly!) A Developer's Journey to Digital Trans...
Teaching Elephants to Dance (and Fly!) A Developer's Journey to Digital Trans...Burr Sutter
 
examkiller 000-938
examkiller 000-938examkiller 000-938
examkiller 000-938jimenoon
 
8 Things That Make Continuous Delivery Go Nuts
8 Things That Make Continuous Delivery Go Nuts8 Things That Make Continuous Delivery Go Nuts
8 Things That Make Continuous Delivery Go NutsEduards Sizovs
 
Mainframe Solutions Introduction
Mainframe Solutions IntroductionMainframe Solutions Introduction
Mainframe Solutions IntroductionMicro Focus
 

What's hot (20)

Operations: The Last Mile
Operations: The Last Mile Operations: The Last Mile
Operations: The Last Mile
 
SRE Lessons for the Enterprise
SRE Lessons for the Enterprise SRE Lessons for the Enterprise
SRE Lessons for the Enterprise
 
Modern Operations: Solving DevOps’ Last Mile Problem
Modern Operations: Solving DevOps’ Last Mile Problem Modern Operations: Solving DevOps’ Last Mile Problem
Modern Operations: Solving DevOps’ Last Mile Problem
 
The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management The Last Mile Continued: Incident Management
The Last Mile Continued: Incident Management
 
Failure Happens: Improving Incident Response In Enterprises
Failure Happens: Improving Incident Response In Enterprises Failure Happens: Improving Incident Response In Enterprises
Failure Happens: Improving Incident Response In Enterprises
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
Operations as a Service: Because Failure Still Happens
Operations as a Service: Because Failure Still Happens Operations as a Service: Because Failure Still Happens
Operations as a Service: Because Failure Still Happens
 
The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps The "Ops" Side of DevSecOps
The "Ops" Side of DevSecOps
 
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Keeping Your DevOps Transformation From Crushing Your Ops Capacity
Keeping Your DevOps Transformation From Crushing Your Ops Capacity
 
Operations: The Last Mile Problem For DevOps
Operations: The Last Mile Problem For DevOpsOperations: The Last Mile Problem For DevOps
Operations: The Last Mile Problem For DevOps
 
SRE From Scratch
SRE From ScratchSRE From Scratch
SRE From Scratch
 
Self-Service Operations: Because Failure Still Happens (Developer Edition)
Self-Service Operations: Because Failure Still Happens (Developer Edition)Self-Service Operations: Because Failure Still Happens (Developer Edition)
Self-Service Operations: Because Failure Still Happens (Developer Edition)
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
 
Self-Service Operations: Because Ops Still Happens
Self-Service Operations: Because Ops Still HappensSelf-Service Operations: Because Ops Still Happens
Self-Service Operations: Because Ops Still Happens
 
Agile Infrastructure - Agile 2009
Agile Infrastructure - Agile 2009Agile Infrastructure - Agile 2009
Agile Infrastructure - Agile 2009
 
My History with Atlassian Tools, and Why I'm Moving to Studio
My History with Atlassian Tools, and Why I'm Moving to StudioMy History with Atlassian Tools, and Why I'm Moving to Studio
My History with Atlassian Tools, and Why I'm Moving to Studio
 
Teaching Elephants to Dance (and Fly!) A Developer's Journey to Digital Trans...
Teaching Elephants to Dance (and Fly!) A Developer's Journey to Digital Trans...Teaching Elephants to Dance (and Fly!) A Developer's Journey to Digital Trans...
Teaching Elephants to Dance (and Fly!) A Developer's Journey to Digital Trans...
 
examkiller 000-938
examkiller 000-938examkiller 000-938
examkiller 000-938
 
8 Things That Make Continuous Delivery Go Nuts
8 Things That Make Continuous Delivery Go Nuts8 Things That Make Continuous Delivery Go Nuts
8 Things That Make Continuous Delivery Go Nuts
 
Mainframe Solutions Introduction
Mainframe Solutions IntroductionMainframe Solutions Introduction
Mainframe Solutions Introduction
 

Similar to SysAdmin to SRE: Solving the Last Mile Problem

Deploying 3 times a day without a downtime @ Rocket Tech Summit in Berlin
Deploying 3 times a day without a downtime @ Rocket Tech Summit in BerlinDeploying 3 times a day without a downtime @ Rocket Tech Summit in Berlin
Deploying 3 times a day without a downtime @ Rocket Tech Summit in BerlinAlessandro Nadalin
 
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...Codemotion
 
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018Codemotion
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012Nick Galbreath
 
How to Better Manage Technical Debt While Innovating on DevOps
How to Better Manage Technical Debt While Innovating on DevOpsHow to Better Manage Technical Debt While Innovating on DevOps
How to Better Manage Technical Debt While Innovating on DevOpsDynatrace
 
NoOps for noobs; why i think Devs do not need Ops
NoOps for noobs; why i think Devs do not need OpsNoOps for noobs; why i think Devs do not need Ops
NoOps for noobs; why i think Devs do not need OpsGeert van der Cruijsen
 
The servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONThe servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONDaniel ( Danny ) ☃ Lawrence
 
Continuous delivery while minimizing performance risks
Continuous delivery while minimizing performance risksContinuous delivery while minimizing performance risks
Continuous delivery while minimizing performance risksa32an
 
A Business Case for Git - Tim Pettersen
A Business Case for Git - Tim PettersenA Business Case for Git - Tim Pettersen
A Business Case for Git - Tim PettersenAtlassian
 
Javaland 2017: "You´ll do microservices now". Now what?
Javaland 2017: "You´ll do microservices now". Now what?Javaland 2017: "You´ll do microservices now". Now what?
Javaland 2017: "You´ll do microservices now". Now what?André Goliath
 
Introducing Zeebe.io at Camunda Meetup Vienna 10/2017
Introducing Zeebe.io at Camunda Meetup Vienna 10/2017Introducing Zeebe.io at Camunda Meetup Vienna 10/2017
Introducing Zeebe.io at Camunda Meetup Vienna 10/2017Daniel Meyer
 
Microservice Orchestration at any Scale - Zalando Tech Meetup 09/2017
Microservice Orchestration at any Scale - Zalando Tech Meetup 09/2017 Microservice Orchestration at any Scale - Zalando Tech Meetup 09/2017
Microservice Orchestration at any Scale - Zalando Tech Meetup 09/2017 Zeebe
 
Atlassian - Software For Every Team
Atlassian - Software For Every TeamAtlassian - Software For Every Team
Atlassian - Software For Every TeamSven Peters
 
The Enterprise Architecture you always wanted: A Billion Transactions Per Mon...
The Enterprise Architecture you always wanted: A Billion Transactions Per Mon...The Enterprise Architecture you always wanted: A Billion Transactions Per Mon...
The Enterprise Architecture you always wanted: A Billion Transactions Per Mon...Thoughtworks
 
David Nuescheler: Igniting CQ 5.3: What's New and Roadmap
David Nuescheler: Igniting CQ 5.3: What's New and RoadmapDavid Nuescheler: Igniting CQ 5.3: What's New and Roadmap
David Nuescheler: Igniting CQ 5.3: What's New and RoadmapDay Software
 
JavaOne - Performance Focused DevOps to Improve Cont Delivery
JavaOne - Performance Focused DevOps to Improve Cont DeliveryJavaOne - Performance Focused DevOps to Improve Cont Delivery
JavaOne - Performance Focused DevOps to Improve Cont DeliveryAndreas Grabner
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Rundeck
 
Finding and fixing top performance issues with new relic rpm
Finding and fixing top performance issues with new relic rpmFinding and fixing top performance issues with new relic rpm
Finding and fixing top performance issues with new relic rpmBrian Doll
 
Rails Operations - Lessons Learned
Rails Operations -  Lessons LearnedRails Operations -  Lessons Learned
Rails Operations - Lessons LearnedJosh Nichols
 

Similar to SysAdmin to SRE: Solving the Last Mile Problem (20)

Deploying 3 times a day without a downtime @ Rocket Tech Summit in Berlin
Deploying 3 times a day without a downtime @ Rocket Tech Summit in BerlinDeploying 3 times a day without a downtime @ Rocket Tech Summit in Berlin
Deploying 3 times a day without a downtime @ Rocket Tech Summit in Berlin
 
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...
 
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
 
How to Better Manage Technical Debt While Innovating on DevOps
How to Better Manage Technical Debt While Innovating on DevOpsHow to Better Manage Technical Debt While Innovating on DevOps
How to Better Manage Technical Debt While Innovating on DevOps
 
NoOps for noobs; why i think Devs do not need Ops
NoOps for noobs; why i think Devs do not need OpsNoOps for noobs; why i think Devs do not need Ops
NoOps for noobs; why i think Devs do not need Ops
 
The servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONThe servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECON
 
Continuous delivery while minimizing performance risks
Continuous delivery while minimizing performance risksContinuous delivery while minimizing performance risks
Continuous delivery while minimizing performance risks
 
A Business Case for Git - Tim Pettersen
A Business Case for Git - Tim PettersenA Business Case for Git - Tim Pettersen
A Business Case for Git - Tim Pettersen
 
Javaland 2017: "You´ll do microservices now". Now what?
Javaland 2017: "You´ll do microservices now". Now what?Javaland 2017: "You´ll do microservices now". Now what?
Javaland 2017: "You´ll do microservices now". Now what?
 
Introducing Zeebe.io at Camunda Meetup Vienna 10/2017
Introducing Zeebe.io at Camunda Meetup Vienna 10/2017Introducing Zeebe.io at Camunda Meetup Vienna 10/2017
Introducing Zeebe.io at Camunda Meetup Vienna 10/2017
 
Devops down-under
Devops down-underDevops down-under
Devops down-under
 
Microservice Orchestration at any Scale - Zalando Tech Meetup 09/2017
Microservice Orchestration at any Scale - Zalando Tech Meetup 09/2017 Microservice Orchestration at any Scale - Zalando Tech Meetup 09/2017
Microservice Orchestration at any Scale - Zalando Tech Meetup 09/2017
 
Atlassian - Software For Every Team
Atlassian - Software For Every TeamAtlassian - Software For Every Team
Atlassian - Software For Every Team
 
The Enterprise Architecture you always wanted: A Billion Transactions Per Mon...
The Enterprise Architecture you always wanted: A Billion Transactions Per Mon...The Enterprise Architecture you always wanted: A Billion Transactions Per Mon...
The Enterprise Architecture you always wanted: A Billion Transactions Per Mon...
 
David Nuescheler: Igniting CQ 5.3: What's New and Roadmap
David Nuescheler: Igniting CQ 5.3: What's New and RoadmapDavid Nuescheler: Igniting CQ 5.3: What's New and Roadmap
David Nuescheler: Igniting CQ 5.3: What's New and Roadmap
 
JavaOne - Performance Focused DevOps to Improve Cont Delivery
JavaOne - Performance Focused DevOps to Improve Cont DeliveryJavaOne - Performance Focused DevOps to Improve Cont Delivery
JavaOne - Performance Focused DevOps to Improve Cont Delivery
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
Finding and fixing top performance issues with new relic rpm
Finding and fixing top performance issues with new relic rpmFinding and fixing top performance issues with new relic rpm
Finding and fixing top performance issues with new relic rpm
 
Rails Operations - Lessons Learned
Rails Operations -  Lessons LearnedRails Operations -  Lessons Learned
Rails Operations - Lessons Learned
 

More from Rundeck

Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck Community Office Hours:  Using Variables with Job Steps Rundeck Community Office Hours:  Using Variables with Job Steps
Rundeck Community Office Hours: Using Variables with Job Steps Rundeck
 
Introducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationIntroducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationRundeck
 
How to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckHow to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckRundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Lunch and learn:  Getting started with Rundeck & AnsibleLunch and learn:  Getting started with Rundeck & Ansible
Lunch and learn: Getting started with Rundeck & AnsibleRundeck
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...Rundeck
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck Office Hours:  Best Practices Access Control PoliciesRundeck Office Hours:  Best Practices Access Control Policies
Rundeck Office Hours: Best Practices Access Control PoliciesRundeck
 
Mastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckMastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckRundeck
 
What's New in Rundeck 3.4
What's New in Rundeck 3.4   What's New in Rundeck 3.4
What's New in Rundeck 3.4 Rundeck
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...Rundeck
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Rundeck
 
Introduction to Rundeck
Introduction to Rundeck Introduction to Rundeck
Introduction to Rundeck Rundeck
 
Automated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuAutomated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuRundeck
 
Modernizing Incident Response
Modernizing Incident Response Modernizing Incident Response
Modernizing Incident Response Rundeck
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Rundeck
 
Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Rundeck
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck OverviewRundeck
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationRundeck
 
Advanced Cluster Settings
Advanced Cluster Settings Advanced Cluster Settings
Advanced Cluster Settings Rundeck
 
Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Rundeck
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
 

More from Rundeck (20)

Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck Community Office Hours:  Using Variables with Job Steps Rundeck Community Office Hours:  Using Variables with Job Steps
Rundeck Community Office Hours: Using Variables with Job Steps
 
Introducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationIntroducing PagerDuty Process Automation
Introducing PagerDuty Process Automation
 
How to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckHow to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in Rundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Lunch and learn:  Getting started with Rundeck & AnsibleLunch and learn:  Getting started with Rundeck & Ansible
Lunch and learn: Getting started with Rundeck & Ansible
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck Office Hours:  Best Practices Access Control PoliciesRundeck Office Hours:  Best Practices Access Control Policies
Rundeck Office Hours: Best Practices Access Control Policies
 
Mastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckMastering Secrets Management in Rundeck
Mastering Secrets Management in Rundeck
 
What's New in Rundeck 3.4
What's New in Rundeck 3.4   What's New in Rundeck 3.4
What's New in Rundeck 3.4
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation
 
Introduction to Rundeck
Introduction to Rundeck Introduction to Rundeck
Introduction to Rundeck
 
Automated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuAutomated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + Sensu
 
Modernizing Incident Response
Modernizing Incident Response Modernizing Incident Response
Modernizing Incident Response
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
 
Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck Overview
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
 
Advanced Cluster Settings
Advanced Cluster Settings Advanced Cluster Settings
Advanced Cluster Settings
 
Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
 

Recently uploaded

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

SysAdmin to SRE: Solving the Last Mile Problem

  • 1. SysAdmin to SRE: Solving the Last Mile Problem Damon Edwards @damonedwards
  • 2.
  • 4. Operations: The Last Mile Silos Queues Excessive ToilLow Trust
  • 7.
  • 8. “SRE… When you ask software engineers to do operations” “SRE… Next-generation, cloud-native Operations” Class SRE implements DevOps “SRE… When Ops does more engineering than Ops”
  • 9. “SRE… When you ask software engineers to do operations” “SRE… Next-generation, cloud-native Operations” Class SRE implements DevOps “SRE… When Ops does more engineering than Ops” SRE
  • 10. Why SRE? Simon Sinek Start with “why?”
  • 12. Its was just another Thursday…
  • 13. Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login What a c#@p service! I can’t login Barely works It’s broken Customers Thursday 10:00am PDT (1200 Agents)
  • 14. t a c#@p ervice! rks Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers
  • 15. Call Center Agent Technical Support Service Desk Many tickets Many calls “Stuff isn’t working” “…but monitoring is all green” Service Desk OK OK OK OK OK Ops Ops
  • 16. …but monitoring is all green” OK OK OK OK OK Call Center Agent Customer Now it works Now it works Service Desk ? Ops Ops 3:30pm
  • 18. Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Custo VIP Cu Friday 9:00am PDT
  • 19. Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK
  • 20. Service Desk Escalate! Incident Commander Ticket Launch the incident bridge Ops Incident Commander Ops Dev Sec Ops Bridge Call Ops Not me… Not me… Not me… Not me… No code updates Probably not the new server hardening process or the network changes… Headcount: 40
  • 21. ev No code updates Probably not the new server dening process or the network changes… Ops Ops Ops Uhh.. WHAT new server hardening process and network changes? Sec We were going to fail audit… you didn’t get the email?
  • 25. Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Custo VIP Cus Monday 10:00am PDT
  • 26. Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK
  • 27. “…but monitoring is all green” Service Desk OK OK OK OK OK Customer Systems Lead Dev ding! Ignore. Incident Commander Hey did you s that ticket? Scrum
  • 28. ustomer Systems Lead Dev Ignore. Incident Commander Hey did you see that ticket? sigh. I’ll take a look Scrum Customer Systems Lead Dev Customer S Lead D Somet the data
  • 29. . I’ll take a look r Systems d Dev Customer Systems Lead Dev Something is wrong with the database connection… … But our code didn’t change. DBA No recent database updates.
  • 31. Dev Bridge Call No code updates War Room DBA “Try this” Test SysAdmin “Try this” Test Network “Try this” Test Security “Try this” Test SysEng “Try this” Test Incident Commander “New Theory: Its the database connection” Call Center Agent Customer Now it works Now it works Call Center Manager What is going on? 4:00pm Headcount: 20
  • 33. Dev Bridge Call No code updates War Room DBA “Try this” Test DBA “Try this” Test SysAdmin “Try this” Test SysEng “Try this” Test SysEng “Try this” Test Incident Commander“New Theory: “problem with stored procedures… but not sure what” Incident Commander DB Vendor phone support isn’t cutting it. Call Center Manager What is going on? Call Center Director What is being done? Tuesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers
  • 34. Dev No code updates War Room Test Test Test Test Test Incident Commander Incident Commander Vendor Management DB Vendor phone support isn’t cutting it. We only paid for bronze support Call Center Manager What is going on? Call Center Director What is being done? Approval Request “Need to upgrade support” Finance ??
  • 36. Dev Bridge Call No code updates War Room Vendor Consultant “Let’s see with the vendor consultant says” Call Center Manager What is going on? Call Center Director What is being done? OK, let me take a look. Ven Cons So per Wednesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Headcount: 15
  • 37. Dev e No code updates War Room Call Center Manager What is going on? Call Center Director What is being done? Vendor Consultant So? Someone toggled on the new performance analysis feature DBA 3:00pm dcount: 15
  • 38. So? Vendor Consultant Its been choking on a particular stored procedure you use everywhere… This stored procedure has almost 400 parameters. It’s 1 million lines of code but… its been working for years! ? ? ?DBA Dev m
  • 39. but… its been working for years! ? ? ? Ops SysEng QA Ops QA DBA change config load test Dev 1:00am Headcount: 10
  • 40. but… its been working for years! ? ? ? Ops SysEng QA Ops QA DBA change config load test Dev 1:00am Headcount: 10 .
  • 42. Vendor Consultant Dir Finance No budget GM, Line of Business Stay on schedule You should really fix that… Ops It’s not fixed. It’s just turned off. VP Ops I’m told bug #8543 is P1, but was rejected? Ops Refactor it before it bites us again. VP Dev It’s not a bug. You already have a fix. Dev wins Dev wins Dev No time. Dev Their change broke it.Dev vs Ops
  • 43. Vendor Consultant Dir Finance No budget GM, Line of Business Stay on schedule You should really fix that… Ops It’s not fixed. It’s just turned off. VP Ops I’m told bug #8543 is P1, but was rejected? Ops Refactor it before it bites us again. VP Dev It’s not a bug. You already have a fix. Dev wins Dev wins Dev No time. Dev Their change broke it.Dev vs Ops
  • 44. Vendor Consultant Dir Finance No budget GM, Line of Business Stay on schedule You should really fix that… Ops It’s not fixed. It’s just turned off. VP Ops I’m told bug #8543 is P1, but was rejected? Ops Refactor it before it bites us again. VP Dev It’s not a bug. You already have a fix. Dev wins Dev wins Dev No time. Dev Their change broke it.Dev vs Ops
  • 45. Vendor Consultant Dir Finance No budget GM, Line of Business Stay on schedule You should really fix that… Ops It’s not fixed. It’s just turned off. VP Ops I’m told bug #8543 is P1, but was rejected? Ops Refactor it before it bites us again. VP Dev It’s not a bug. You already have a fix. Dev wins Dev wins Dev No time. Dev Their change broke it.Dev vs Ops
  • 46. Vendor Consultant Dir Finance No budget GM, Line of Business Stay on schedule You should really fix that… Ops It’s not fixed. It’s just turned off. VP Ops I’m told bug #8543 is P1, but was rejected? Ops Refactor it before it bites us again. VP Dev It’s not a bug. You already have a fix. Dev wins Dev wins Dev No time. Dev Their change broke it.Dev vs Ops
  • 47. Vendor Consultant Dir Finance No budget GM, Line of Business Stay on schedule You should really fix that… Ops It’s not fixed. It’s just turned off. VP Ops I’m told bug #8543 is P1, but was rejected? Ops Refactor it before it bites us again. VP Dev It’s not a bug. You already have a fix. Dev wins Dev wins Dev No time. Dev Their change broke it.Dev vs Ops
  • 48. Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login What a c#@p service! I can’t login Barely works It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Call Center Agent Customer Now it works Now it works Service Desk ? Ops Ops Thursday 10:00am PDT 3:30pm (1200 Agents) Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Service Desk Escalate! Incident Commander Ticket Launch the incident bridge Ops Incident Commander Ops Dev Sec Ops Bridge Call Ops Not me… Not me… Not me… Not me… No code updates Probably not the new server hardening process or the network changes… Ops Ops Ops Uhh.. WHAT new server hardening process and network changes? Sec We were going to fail audit… you didn’t get the email? Dev Bridge Call No code updates War Room SysAdmin “Try this” Test Platform “Try this” Test Network “Try this” Test Security “Try this” Test Storage “Try this” Test SysEng “Try this” Test Incident Commander “Theory: new security updates” Call Center Agent Customer Now it works Now it works Ops Ops Sec Ops Ops Call Center Manager What is going on? Ops Rollback: -OS changes -Network changes 3:30pm Over the weekend QA Headcount: 40 Headcount: 30 Headcount: 10 Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Bridge Call DBA “Try this” SysAdmin “Try this” Network “Try this” Security “Try this” SysEng “Try this” “New Theory: Its the database connection” Customer Systems Lead Dev ding! Ignore. Incident Commander Hey did you see that ticket? sigh. I’ll take a look Scrum Customer Systems Lead Dev Customer Systems Lead Dev Something is wrong with the database connection… … But our code didn’t change. DBA No recent database updates. Monday 10:00am PDT Headco Dev Bridge Call No code updates War Room DBA “Try this” Test DBA “Try this” Test SysAdmin “Try this” Test SysEng “Try this” Test SysEng “Try this” Test Incident Commander“New Theory: “problem with stored procedures… but not sure what” Incident Commander Vendor Management DB Vendor phone support isn’t cutting it. We only paid for bronze support Call Center Manager What is going on? Call Center Director What is being done? Approval Request “Need to upgrade support” Finance ?? Tuesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Dev Bridge Call No code updates War Room Vendor Consultant “Let’s see with the vendor consultant says” Call Center Manager What is going on? Call Center Director What is being done? OK, let me take a look. Vendor Consultant So? Vendor Consultant Its been choking on a particular stored procedure you use everywhere…Someone toggled on the new performance analysis feature This stored procedure has almost 400 parameters. It’s 1 million lines of code but… its been working for years! ? ? ? Ops Sys Ops QA change config load test Wednesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers DBA Dev 3:00pm Headcount: 15 Headcount: 10 Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers Friday 9:00am PDT
  • 49. Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login What a c#@p service! I can’t login Barely works It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Call Center Agent Customer Now it works Now it works Service Desk ? Ops Ops Thursday 10:00am PDT 3:30pm (1200 Agents) Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Service Desk Escalate! Incident Commander Ticket Launch the incident bridge Ops Incident Commander Ops Dev Sec Ops Bridge Call Ops Not me… Not me… Not me… Not me… No code updates Probably not the new server hardening process or the network changes… Ops Ops Ops Uhh.. WHAT new server hardening process and network changes? Sec We were going to fail audit… you didn’t get the email? Dev Bridge Call No code updates War Room SysAdmin “Try this” Test Platform “Try this” Test Network “Try this” Test Security “Try this” Test Storage “Try this” Test SysEng “Try this” Test Incident Commander “Theory: new security updates” Call Center Agent Customer Now it works Now it works Ops Ops Sec Ops Ops Call Center Manager What is going on? Ops Rollback: -OS changes -Network changes 3:30pm Over the weekend QA Headcount: 40 Headcount: 30 Headcount: 10 Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Bridge Call DBA “Try this” SysAdmin “Try this” Network “Try this” Security “Try this” SysEng “Try this” “New Theory: Its the database connection” Customer Systems Lead Dev ding! Ignore. Incident Commander Hey did you see that ticket? sigh. I’ll take a look Scrum Customer Systems Lead Dev Customer Systems Lead Dev Something is wrong with the database connection… … But our code didn’t change. DBA No recent database updates. Monday 10:00am PDT Headco Dev Bridge Call No code updates War Room DBA “Try this” Test DBA “Try this” Test SysAdmin “Try this” Test SysEng “Try this” Test SysEng “Try this” Test Incident Commander“New Theory: “problem with stored procedures… but not sure what” Incident Commander Vendor Management DB Vendor phone support isn’t cutting it. We only paid for bronze support Call Center Manager What is going on? Call Center Director What is being done? Approval Request “Need to upgrade support” Finance ?? Tuesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Dev Bridge Call No code updates War Room Vendor Consultant “Let’s see with the vendor consultant says” Call Center Manager What is going on? Call Center Director What is being done? OK, let me take a look. Vendor Consultant So? Vendor Consultant Its been choking on a particular stored procedure you use everywhere…Someone toggled on the new performance analysis feature This stored procedure has almost 400 parameters. It’s 1 million lines of code but… its been working for years! ? ? ? Ops Sys Ops QA change config load test Wednesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers DBA Dev 3:00pm Headcount: 15 Headcount: 10 Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers Friday 9:00am PDT Response labor: $270,000 Lost call center productivity: $620,000 $890,000
  • 50. Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login What a c#@p service! I can’t login Barely works It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Call Center Agent Customer Now it works Now it works Service Desk ? Ops Ops Thursday 10:00am PDT 3:30pm (1200 Agents) Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Service Desk Escalate! Incident Commander Ticket Launch the incident bridge Ops Incident Commander Ops Dev Sec Ops Bridge Call Ops Not me… Not me… Not me… Not me… No code updates Probably not the new server hardening process or the network changes… Ops Ops Ops Uhh.. WHAT new server hardening process and network changes? Sec We were going to fail audit… you didn’t get the email? Dev Bridge Call No code updates War Room SysAdmin “Try this” Test Platform “Try this” Test Network “Try this” Test Security “Try this” Test Storage “Try this” Test SysEng “Try this” Test Incident Commander “Theory: new security updates” Call Center Agent Customer Now it works Now it works Ops Ops Sec Ops Ops Call Center Manager What is going on? Ops Rollback: -OS changes -Network changes 3:30pm Over the weekend QA Headcount: 40 Headcount: 30 Headcount: 10 Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Bridge Call DBA “Try this” SysAdmin “Try this” Network “Try this” Security “Try this” SysEng “Try this” “New Theory: Its the database connection” Customer Systems Lead Dev ding! Ignore. Incident Commander Hey did you see that ticket? sigh. I’ll take a look Scrum Customer Systems Lead Dev Customer Systems Lead Dev Something is wrong with the database connection… … But our code didn’t change. DBA No recent database updates. Monday 10:00am PDT Headco Dev Bridge Call No code updates War Room DBA “Try this” Test DBA “Try this” Test SysAdmin “Try this” Test SysEng “Try this” Test SysEng “Try this” Test Incident Commander“New Theory: “problem with stored procedures… but not sure what” Incident Commander Vendor Management DB Vendor phone support isn’t cutting it. We only paid for bronze support Call Center Manager What is going on? Call Center Director What is being done? Approval Request “Need to upgrade support” Finance ?? Tuesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Dev Bridge Call No code updates War Room Vendor Consultant “Let’s see with the vendor consultant says” Call Center Manager What is going on? Call Center Director What is being done? OK, let me take a look. Vendor Consultant So? Vendor Consultant Its been choking on a particular stored procedure you use everywhere…Someone toggled on the new performance analysis feature This stored procedure has almost 400 parameters. It’s 1 million lines of code but… its been working for years! ? ? ? Ops Sys Ops QA change config load test Wednesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers DBA Dev 3:00pm Headcount: 15 Headcount: 10 Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers Friday 9:00am PDT Response labor: $270,000 Lost call center productivity: $620,000 $890,000 (+ project delays)
  • 51. Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login What a c#@p service! I can’t login Barely works It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Call Center Agent Customer Now it works Now it works Service Desk ? Ops Ops Thursday 10:00am PDT 3:30pm (1200 Agents) Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Service Desk Escalate! Incident Commander Ticket Launch the incident bridge Ops Incident Commander Ops Dev Sec Ops Bridge Call Ops Not me… Not me… Not me… Not me… No code updates Probably not the new server hardening process or the network changes… Ops Ops Ops Uhh.. WHAT new server hardening process and network changes? Sec We were going to fail audit… you didn’t get the email? Dev Bridge Call No code updates War Room SysAdmin “Try this” Test Platform “Try this” Test Network “Try this” Test Security “Try this” Test Storage “Try this” Test SysEng “Try this” Test Incident Commander “Theory: new security updates” Call Center Agent Customer Now it works Now it works Ops Ops Sec Ops Ops Call Center Manager What is going on? Ops Rollback: -OS changes -Network changes 3:30pm Over the weekend QA Headcount: 40 Headcount: 30 Headcount: 10 Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Bridge Call DBA “Try this” SysAdmin “Try this” Network “Try this” Security “Try this” SysEng “Try this” “New Theory: Its the database connection” Customer Systems Lead Dev ding! Ignore. Incident Commander Hey did you see that ticket? sigh. I’ll take a look Scrum Customer Systems Lead Dev Customer Systems Lead Dev Something is wrong with the database connection… … But our code didn’t change. DBA No recent database updates. Monday 10:00am PDT Headco Dev Bridge Call No code updates War Room DBA “Try this” Test DBA “Try this” Test SysAdmin “Try this” Test SysEng “Try this” Test SysEng “Try this” Test Incident Commander“New Theory: “problem with stored procedures… but not sure what” Incident Commander Vendor Management DB Vendor phone support isn’t cutting it. We only paid for bronze support Call Center Manager What is going on? Call Center Director What is being done? Approval Request “Need to upgrade support” Finance ?? Tuesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Dev Bridge Call No code updates War Room Vendor Consultant “Let’s see with the vendor consultant says” Call Center Manager What is going on? Call Center Director What is being done? OK, let me take a look. Vendor Consultant So? Vendor Consultant Its been choking on a particular stored procedure you use everywhere…Someone toggled on the new performance analysis feature This stored procedure has almost 400 parameters. It’s 1 million lines of code but… its been working for years! ? ? ? Ops Sys Ops QA change config load test Wednesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers DBA Dev 3:00pm Headcount: 15 Headcount: 10 Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers Friday 9:00am PDT Response labor: $270,000 Lost call center productivity: $620,000 $890,000 (+ project delays) (+ brand damage)
  • 52. Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login What a c#@p service! I can’t login Barely works It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Call Center Agent Customer Now it works Now it works Service Desk ? Ops Ops Thursday 10:00am PDT 3:30pm (1200 Agents) Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Service Desk Escalate! Incident Commander Ticket Launch the incident bridge Ops Incident Commander Ops Dev Sec Ops Bridge Call Ops Not me… Not me… Not me… Not me… No code updates Probably not the new server hardening process or the network changes… Ops Ops Ops Uhh.. WHAT new server hardening process and network changes? Sec We were going to fail audit… you didn’t get the email? Dev Bridge Call No code updates War Room SysAdmin “Try this” Test Platform “Try this” Test Network “Try this” Test Security “Try this” Test Storage “Try this” Test SysEng “Try this” Test Incident Commander “Theory: new security updates” Call Center Agent Customer Now it works Now it works Ops Ops Sec Ops Ops Call Center Manager What is going on? Ops Rollback: -OS changes -Network changes 3:30pm Over the weekend QA Headcount: 40 Headcount: 30 Headcount: 10 Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers “…but monitoring is all green” Service Desk OK OK OK OK OK Bridge Call DBA “Try this” SysAdmin “Try this” Network “Try this” Security “Try this” SysEng “Try this” “New Theory: Its the database connection” Customer Systems Lead Dev ding! Ignore. Incident Commander Hey did you see that ticket? sigh. I’ll take a look Scrum Customer Systems Lead Dev Customer Systems Lead Dev Something is wrong with the database connection… … But our code didn’t change. DBA No recent database updates. Monday 10:00am PDT Headco Dev Bridge Call No code updates War Room DBA “Try this” Test DBA “Try this” Test SysAdmin “Try this” Test SysEng “Try this” Test SysEng “Try this” Test Incident Commander“New Theory: “problem with stored procedures… but not sure what” Incident Commander Vendor Management DB Vendor phone support isn’t cutting it. We only paid for bronze support Call Center Manager What is going on? Call Center Director What is being done? Approval Request “Need to upgrade support” Finance ?? Tuesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Dev Bridge Call No code updates War Room Vendor Consultant “Let’s see with the vendor consultant says” Call Center Manager What is going on? Call Center Director What is being done? OK, let me take a look. Vendor Consultant So? Vendor Consultant Its been choking on a particular stored procedure you use everywhere…Someone toggled on the new performance analysis feature This stored procedure has almost 400 parameters. It’s 1 million lines of code but… its been working for years! ? ? ? Ops Sys Ops QA change config load test Wednesday 10:00am PDT Call Center Agent Call Center Agent … so frustrating Not again… I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers DBA Dev 3:00pm Headcount: 15 Headcount: 10 Call Center Agent Call Center Agent My browser times out!Wow, this is so slow! I can’t login Are you kidding me? How hard is it to run a website? Soo Sloooow It’s broken Customers Call Center Agent Technical Support Service Desk Many tickets Many calls Customers “Stuff isn’t working” VIP Customers Friday 9:00am PDT Response labor: $270,000 Lost call center productivity: $620,000 $890,000 (+ project delays) (+ brand damage) > $1,000,000
  • 53. How did they end up here?
  • 54.
  • 58. Corporate Plan Annual Budget Project Plan Requirements Context Context Process Process Tooling Tooling Capacity Capacity
  • 59. What were they thinking?
  • 60. 26 ITIL Processes Service Validation & Testing Strategy Management for IT Services Supplier Management The 7 Step Improvement Transition Planning & Support Access Management Availability Management Business Relationship Management Capacity Management Change Management Change Evaluation Demand Management Design Coordination Event Management Financial Management for IT Services Incident Management Information Security Management IT Service Continuity Management Knowledge Management Process Problem Management Process Release & Deployment Management Request Fulfillment Process Service Asset & Configuration Management Service Catalog Management Service Level Management Service Portfolio Management ITIL Processes The same as everyone else.
  • 61. 26 ITIL Processes Service Validation & Testing Strategy Management for IT Services Supplier Management The 7 Step Improvement Transition Planning & Support Access Management Availability Management Business Relationship Management Capacity Management Change Management Change Evaluation Demand Management Design Coordination Event Management Financial Management for IT Services Incident Management Information Security Management IT Service Continuity Management Knowledge Management Process Problem Management Process Release & Deployment Management Request Fulfillment Process Service Asset & Configuration Management Service Catalog Management Service Level Management Service Portfolio Management
  • 62. 26 ITIL Processes Service Validation & Testing Strategy Management for IT Services Supplier Management The 7 Step Improvement Transition Planning & Support Access Management Availability Management Business Relationship Management Capacity Management Change Management Change Evaluation Demand Management Design Coordination Event Management Financial Management for IT Services Incident Management Information Security Management IT Service Continuity Management Knowledge Management Process Problem Management Process Release & Deployment Management Request Fulfillment Process Service Asset & Configuration Management Service Catalog Management Service Level Management Service Portfolio Management
  • 63. 26 ITIL Processes Service Validation & Testing Strategy Management for IT Services Supplier Management The 7 Step Improvement Transition Planning & Support Access Management Availability Management Business Relationship Management Capacity Management Change Management Change Evaluation Demand Management Design Coordination Event Management Financial Management for IT Services Incident Management Information Security Management IT Service Continuity Management Knowledge Management Process Problem Management Process Release & Deployment Management Request Fulfillment Process Service Asset & Configuration Management Service Catalog Management Service Level Management Service Portfolio Management
  • 64. 26 ITIL Processes Service Validation & Testing Strategy Management for IT Services Supplier Management The 7 Step Improvement Transition Planning & Support Access Management Availability Management Business Relationship Management Capacity Management Change Management Change Evaluation Demand Management Design Coordination Event Management Financial Management for IT Services Incident Management Information Security Management IT Service Continuity Management Knowledge Management Process Problem Management Process Release & Deployment Management Request Fulfillment Process Service Asset & Configuration Management Service Catalog Management Service Level Management Service Portfolio Management
  • 65. 26 ITIL Processes Service Validation & Testing Strategy Management for IT Services Supplier Management The 7 Step Improvement Transition Planning & Support Access Management Availability Management Business Relationship Management Capacity Management Change Management Change Evaluation Demand Management Design Coordination Event Management Financial Management for IT Services Incident Management Information Security Management IT Service Continuity Management Knowledge Management Process Problem Management Process Release & Deployment Management Request Fulfillment Process Service Asset & Configuration Management Service Catalog Management Service Level Management Service Portfolio Management Encourages Silos Context Context Process Process Tooling Tooling Capacity Capacity
  • 66. 26 ITIL Processes Service Validation & Testing Strategy Management for IT Services Supplier Management The 7 Step Improvement Transition Planning & Support Access Management Availability Management Business Relationship Management Capacity Management Change Management Change Evaluation Demand Management Design Coordination Event Management Financial Management for IT Services Incident Management Information Security Management IT Service Continuity Management Knowledge Management Process Problem Management Process Release & Deployment Management Request Fulfillment Process Service Asset & Configuration Management Service Catalog Management Service Level Management Service Portfolio Management Encourages Silos Context Context Process Process Tooling Tooling Capacity Capacity Command and Control Management
  • 67. 26 ITIL Processes Service Validation & Testing Strategy Management for IT Services Supplier Management The 7 Step Improvement Transition Planning & Support Access Management Availability Management Business Relationship Management Capacity Management Change Management Change Evaluation Demand Management Design Coordination Event Management Financial Management for IT Services Incident Management Information Security Management IT Service Continuity Management Knowledge Management Process Problem Management Process Release & Deployment Management Request Fulfillment Process Service Asset & Configuration Management Service Catalog Management Service Level Management Service Portfolio Management Encourages Silos Context Context Process Process Tooling Tooling Capacity Capacity Command and Control Management Deming “3. Cease dependence on inspection to achieve quality.”
  • 68. 26 ITIL Processes Service Validation & Testing Strategy Management for IT Services Supplier Management The 7 Step Improvement Transition Planning & Support Access Management Availability Management Business Relationship Management Capacity Management Change Management Change Evaluation Demand Management Design Coordination Event Management Financial Management for IT Services Incident Management Information Security Management IT Service Continuity Management Knowledge Management Process Problem Management Process Release & Deployment Management Request Fulfillment Process Service Asset & Configuration Management Service Catalog Management Service Level Management Service Portfolio Management Encourages Silos Context Context Process Process Tooling Tooling Capacity Capacity Command and Control Management Deming “3. Cease dependence on inspection to achieve quality.” Charity Majors “Distributed systems have an infinite list of almost impossible failure scenarios”
  • 69. 26 ITIL Processes Service Validation & Testing Strategy Management for IT Services Supplier Management The 7 Step Improvement Transition Planning & Support Access Management Availability Management Business Relationship Management Capacity Management Change Management Change Evaluation Demand Management Design Coordination Event Management Financial Management for IT Services Incident Management Information Security Management IT Service Continuity Management Knowledge Management Process Problem Management Process Release & Deployment Management Request Fulfillment Process Service Asset & Configuration Management Service Catalog Management Service Level Management Service Portfolio Management Encourages Silos Context Context Process Process Tooling Tooling Capacity Capacity Command and Control Management Deming “3. Cease dependence on inspection to achieve quality.” X X X X X X Charity Majors “Distributed systems have an infinite list of almost impossible failure scenarios”
  • 70. Is there a different way?
  • 71. The Rise of a New IT Operations Support Model By 2015, DevOps will evolve from a niche strategy employed by large cloud providers into a mainstream strategy employed by 20% of Global 2000 organizations Why DevOps will emerge: !DevOps is not usually driven from Why DevOps will not emerge: !Cultural changes are the hardest to by 20% of Global 2000 organizations. !DevOps is not usually driven from the top down and, thus, may be more easily accepted by IT operations teams. !Cultural changes are the hardest to implement, and DevOps requires a significant rethinking of IT operations conventional wisdom. !ITIL and other best practices frameworks are acknowledged to have not delivered on their goals, enabling IT organizations to look for !There is a large body of work with respect to ITIL and other best practices frameworks that is already accepted within the industry enabling IT organizations to look for new models. !The growing interest in tools such as Chef, Puppet, etc., will help accepted within the industry. !Open source (OSS) management tools, which are more aligned with this approach, have not seen pp p stimulate demand for OSS-based management pp significant enterprise market share traction. March 18, 2011 Cameron Haight DevOps vs ITIL?
  • 72. The Rise of a New IT Operations Support Model By 2015, DevOps will evolve from a niche strategy employed by large cloud providers into a mainstream strategy employed by 20% of Global 2000 organizations Why DevOps will emerge: !DevOps is not usually driven from Why DevOps will not emerge: !Cultural changes are the hardest to by 20% of Global 2000 organizations. !DevOps is not usually driven from the top down and, thus, may be more easily accepted by IT operations teams. !Cultural changes are the hardest to implement, and DevOps requires a significant rethinking of IT operations conventional wisdom. !ITIL and other best practices frameworks are acknowledged to have not delivered on their goals, enabling IT organizations to look for !There is a large body of work with respect to ITIL and other best practices frameworks that is already accepted within the industry enabling IT organizations to look for new models. !The growing interest in tools such as Chef, Puppet, etc., will help accepted within the industry. !Open source (OSS) management tools, which are more aligned with this approach, have not seen pp p stimulate demand for OSS-based management pp significant enterprise market share traction. March 18, 2011 Cameron Haight DevOps vs ITIL?
  • 73. The Rise of a New IT Operations Support Model By 2015, DevOps will evolve from a niche strategy employed by large cloud providers into a mainstream strategy employed by 20% of Global 2000 organizations Why DevOps will emerge: !DevOps is not usually driven from Why DevOps will not emerge: !Cultural changes are the hardest to by 20% of Global 2000 organizations. !DevOps is not usually driven from the top down and, thus, may be more easily accepted by IT operations teams. !Cultural changes are the hardest to implement, and DevOps requires a significant rethinking of IT operations conventional wisdom. !ITIL and other best practices frameworks are acknowledged to have not delivered on their goals, enabling IT organizations to look for !There is a large body of work with respect to ITIL and other best practices frameworks that is already accepted within the industry enabling IT organizations to look for new models. !The growing interest in tools such as Chef, Puppet, etc., will help accepted within the industry. !Open source (OSS) management tools, which are more aligned with this approach, have not seen pp p stimulate demand for OSS-based management pp significant enterprise market share traction. March 18, 2011 Cameron Haight DevOps vs ITIL?
  • 78. Product, Not Project Continuous Delivery Shift Left Error Budgets 0 100 !! Toil Limits Cloud Native+ + + + + “Value-Aligned” and Self-Regulating Dev Ops Cross-Functional Team Cross-Functional Team
  • 79. Product, Not Project Continuous Delivery Shift Left Error Budgets 0 100 !! Toil Limits Cloud Native+ + + + + “Value-Aligned” and Self-Regulating Dev Ops Cross-Functional Team Cross-Functional Team Shared Responsibility Model
  • 80. Product, Not Project Continuous Delivery Shift Left Error Budgets 0 100 !! Toil Limits Cloud Native+ + + + + “Value-Aligned” and Self-Regulating Dev Ops Cross-Functional Team Cross-Functional Team Shared Responsibility Model “DevOps is a deconstructive movement” Jon Hall
  • 81. Developer Developer Developer Developer Developer Old Release Still Running Release Plan Release Plan Release Plan Release Plan Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Bugs Deploy Feature to Production Immutable microservice deployment scales, is faster with large teams and diverse platform components Adrian Cockcroft https://www.youtube.com/watch?v=nMTaS07i3jk DockerCon EU 2014 Architecture enables speed. Speed is the advantage.
  • 82. Developer Developer Developer Developer Developer Old Release Still Running Release Plan Release Plan Release Plan Release Plan Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Bugs Deploy Feature to Production Immutable microservice deployment scales, is faster with large teams and diverse platform components Adrian Cockcroft https://www.youtube.com/watch?v=nMTaS07i3jk DockerCon EU 2014 Architecture enables speed. Speed is the advantage.
  • 83. Developer Developer Developer Developer Developer Old Release Still Running Release Plan Release Plan Release Plan Release Plan Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Deploy Feature to Production Bugs Deploy Feature to Production Immutable microservice deployment scales, is faster with large teams and diverse platform components Adrian Cockcroft https://www.youtube.com/watch?v=nMTaS07i3jk DockerCon EU 2014 Architecture enables speed. Speed is the advantage. Keeps the people out of their own way!
  • 84. What is the innovation of SRE?
  • 85. Principles are what makes SRE different
  • 86. Principles are what makes SRE different Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 87. Principles are what makes SRE different 1. SRE needs Service Level Objectives, with consequences Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 88. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 89. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service)
  • 90. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service) DEV BIZ Ops
  • 91. SLO and Error Budgets: Tools for Shared Responsibility 0 100 Service Level Objective Error Budget* Service Level Indicator (*Use this to improve the service) DEV BIZ Ops SLO takes priority!!
  • 92. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 93. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 94. Toil: Name For a Problem We’ve All Felt
  • 95. Toil: Name For a Problem We’ve All Felt “Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” -Vivek Rau Google
  • 96. Toil vs. Engineering Work Toil Engineering Work Lacks Enduring Value Builds Enduring Value Rote, Repetitive Creative, Iterative Tactical Strategic Increases With Scale Enables Scaling Can Be Automated Requires Human Creativity
  • 97. Excessive Toil Prevents Fixing the System Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
  • 98. Excessive Toil Prevents Fixing the System Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
  • 99. Excessive Toil Prevents Fixing the System Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”) Downward spiral is inevitable!
  • 100. Toil is a Naturally Occurring Force General Evolution of Automation 1. No automation 2. Externally maintained system-specific automation 3. Externally maintained generic automation 4. Internally maintained system-specific automation 5. Systems that don’t need any automation Niall Murphy Microsoft Azure
  • 101. Toil is a Naturally Occurring Force General Evolution of Automation 1. No automation 2. Externally maintained system-specific automation 3. Externally maintained generic automation 4. Internally maintained system-specific automation 5. Systems that don’t need any automation Niall Murphy Microsoft Azure Launch (ToDos & Unknowns) Mature
  • 102. Toil is a Naturally Occurring Force General Evolution of Automation 1. No automation 2. Externally maintained system-specific automation 3. Externally maintained generic automation 4. Internally maintained system-specific automation 5. Systems that don’t need any automation Niall Murphy Microsoft Azure Toil Toil Toil Toil Launch (ToDos & Unknowns) Mature
  • 103. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 104. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Stephen Thorne, Google At DevOps Enterprise Summit London 2018 “Principles of SRE” https://youtu.be/c-w_GYvi0eA
  • 105. SRE teams have the ability to regulate their workload
  • 106. SRE teams have the ability to regulate their workload SRE can say no.
  • 107. SRE teams have the ability to regulate their workload Example: SRE can say no.
  • 108. SRE teams have the ability to regulate their workload Example: What if handing-off responsibility to SRE/Ops wasn’t a right? SRE can say no.
  • 109. SRE teams have the ability to regulate their workload Example: What if handing-off responsibility to SRE/Ops wasn’t a right? (separate the “running in production” from “run by SRE/Ops”) SRE can say no.
  • 110. Principles of SRE are what set SRE apart 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 111. What's the Difference Between DevOps and SRE? 
 (class SRE implements DevOps) @sethvargo@lizthegrey
  • 112. Where to start (the practical approach)
  • 113. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload
  • 114. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!)
  • 115. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!)
  • 116. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!) Reduce toil.
 Everybody wins!
  • 117. Where to start (the practical approach) 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Company-wide culture change (hard!) Company-wide culture change (hard!) Reduce toil.
 Everybody wins!
  • 118. Why focus on reducing toil?
  • 119. Why focus on reducing toil? 1. Lots of value independent of “SRE”
  • 120. 2. Your people are you most expensive assets
 … stay out of their way! Why focus on reducing toil? 1. Lots of value independent of “SRE”
  • 121. Start reducing toil today Toil
  • 122. Start reducing toil today 1. Track toil levels for each team Toil
  • 123. Start reducing toil today 1. Track toil levels for each team Toil
  • 124. Track toil levels for each team
  • 125. Track toil levels for each team • Standardize (e.g. meetings and email are “overhead" not “toil”)
  • 126. Track toil levels for each team • Standardize (e.g. meetings and email are “overhead" not “toil”) • Track • Self-reporting • Periodic surveys • SM or PM interview/sampling
  • 127. Track toil levels for each team • Standardize (e.g. meetings and email are “overhead" not “toil”) • Track • Self-reporting • Periodic surveys • SM or PM interview/sampling • Don’t get lost in time tracking weeds!
  • 128. Start reducing toil today 1. Track toil levels for each team Toil
  • 129. Start reducing toil today 1. Track toil levels for each team Toil 2. Set toil limit for each team (50% is conventional wisdom)
  • 130. Start reducing toil today 1. Track toil levels for each team 2. Set toil limit for each team (50% is conventional wisdom) 3. Fund efforts to reduce toil (with emphasis on teams already over limit) Toil
  • 131. Start reducing toil today 1. Track toil levels for each team 2. Set toil limit for each team (50% is conventional wisdom) 3. Fund efforts to reduce toil (with emphasis on teams already over limit) Toil Michael Kehoe Todd Palino (LinkedIn) At SREcon Americas 2019 Example Process “Code Yellow”
  • 134. Where to focus? Toil Reduce Technical Debt Re-Engineer Processes
  • 135. Where to focus? Toil Reduce Technical Debt Re-Engineer Processes Enable Self-Service
  • 136. Where to focus? Toil Reduce Technical Debt Re-Engineer Processes Enable Self-Service
  • 137.
  • 141. Empower teams to spot and fix the anti-patterns.
  • 142. “Fix this for me, fix it again, then fix it again.” Done.I need you to do X Your other work I need you to do X I need you to do X Ticket Do X Later… Do X Do X Done. Done. Your other work Self-Service Self-Service Self-Service Your other work x2 Your other work x3 Later…Later… Later… Your other work Your other work After Before Wait Interrupt Ticket Wait Interrupt Ticket Wait Interrupt
  • 143. “Fix this for me, fix it again, then fix it again.” Done.I need you to do X Your other work I need you to do X I need you to do X Ticket Do X Later… Do X Do X Done. Done. Your other work Self-Service Self-Service Self-Service Your other work x2 Your other work x3 Later…Later… Later… Your other work Your other work After Before Wait Interrupt Ticket Wait Interrupt Ticket Wait Interrupt
  • 144. “I could fix it, but I can’t get to it.” Environment I could fix it if I could get to it Before Wait Interrupt
  • 145. “I could fix it, but I can’t get to it.” Environment I could fix it if I could get to it Before Wait Interrupt After I’ve got this! Environment Self- Service
  • 146. “The dog-pile.” !! I think its a problem with db07-store2.uswest.acme “$ top” “$ top” db07store2. uswest.acme “$ top” “$ top” “$ top” !! “$ top” !! !! !! healthcheck store2 -all db07store2. uswest.acme Self-Service 1. 2. 3. I think its a problem with db07-store2.uswest.acme
  • 147. “I’m an expert, I don’t read the wiki.” docs Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this… Environment docs Later… Before
  • 148. “I’m an expert, I don’t read the wiki.” docs Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this… Environment docs Later… Before
  • 149. “I’m an expert, I don’t read the wiki.” docs Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart -doit -now” I’ve done this before. I’ve got this… Environment docs Later… Before Service has changed. Use this flag or bad things will happen! Pause monitoring first or we all get woken up! “restart” Environment Later… Update Restart Job ✅ I’ve done this before. I’ve got this. Self-Service Self-Service After
  • 150. “Known issue… doesn’t get permanent fix”
  • 151. “Known issue… doesn’t get permanent fix”
  • 152. Recap: Make Tomorrow Better Than Today Beware: impact of traditional management structures Be practical and start focusing on toil Find and fix toil anti-patterns Empower with Self-Service Runbooks SRE is a new way to think about Ops work 1. SRE needs Service Level Objectives, with consequences 2. SREs have time to make tomorrow better than today 3. SRE teams have the ability to regulate their workload Done.I need you to do X Your other work I need you to do X I need you to do X Ticket Do X Later… Do X Do X Done. Done. Your other work Self-Service Self-Service Self-Service Your other work x2 Your other work x3 Later…Later… Later… Your other work Your other work After Before Wait Interrupt Ticket Wait Interrupt Ticket Wait Interrupt Toil Use DevOps and SRE to improve speed and quality After I’ve got this! Environment Self- Service