5. Who are we?
Hannah Foxwell Jérôme Wiedemann
Associate Director for Pivotal
Labs Platform Services EMEA
Associate Director for Pivotal
Labs Platform Services EMEA
We’re on a mission to build wildly successful Platform Teams.
Technology is the easy part. People are the hard part!
@HannahFoxwell @romrider42
8. IaaS
Platform
Application
Platform Team Mission:
“Build an amazing product and
valuable service for my users!”
Dev Ops
Application Team Mission:
“Build an amazing product and
valuable service for my users!”
Infrastructure Team Mission:
“Build an amazing product and
valuable service for my users!”
@HannahFoxwell @romrider42
9. Some other patterns...
Platform
Eng
AppDev
WARNING: Only if you
absolutely have to.
Avoid silos with good
communication and
collaboration.
Dev Ops
AppOps
Platform
Ops Platform
Application
Dev Ops
Tools
Platform
Application
Dev Ops
SRE
Too much centralisation
can become a constraint,
but providing self serve
tools can help developers
focus on what’s important!
SRE can be a separate
team or it can be a set of
practices adopted by
everyone.
@HannahFoxwell @romrider42
11. The Platform is a Product
Story 1
Story 3
Story 2
Story 1
Story 3
Story 2
Application Users
Application Developers
Platform Users
Product Manager
Platform Engineers
Product Manager
Think of application developers as your customers!
@HannahFoxwell @romrider42
12. Question 2:
What is your path to production?
How can I help make it easier?
@HannahFoxwell @romrider42
26. SLO Error Budget
(per 28 Days)
99% 403 mins
99.5% 202 mins
99.9% 40.3 mins
99.95% 20.2 mins
99.99% 4.03 mins
99.999% 0.04 mins
Your Error
Budget is the
inverse of
your SLO
@HannahFoxwell @romrider42
27. Who gets to use the
Error Budget?
@HannahFoxwell @romrider42
30. Example Error Budget Policy
Service is performing at or above SLO
Service has exceeded Error Budget in
the preceding 4 week window
A single incident consumes more than
20% of Error Budget over 4 weeks
Continue to release changes as
normal within agreed release policy
Halt all changes and releases other
than P1 and P2 issues or security fixes
Post-incident review. Must include at
least 1 action to address the cause
@HannahFoxwell @romrider42
31. Question 4:
What are we going to do
when things go wrong?
@HannahFoxwell @romrider42
39. Ask these 4 simple questions:
1. How are we going to work
together?
2. What is your path to
production?
3. What level of reliability do
our users need?
4. What are we going to do
when things go wrong?
Let’s work together!
@HannahFoxwell @romrider42
40. Are we
getting
better at
this?
Ask these 4 simple questions:
1. How are we going to work
together?
2. What is your path to
production?
3. What level of reliability do
our users need?
4. What are we going to do
when things go wrong?
@HannahFoxwell @romrider42