Between 2016 and 2020, the PureGym IT Team grew from 4 to 40 people. During that time Rich worked closely with them to navigate the tricky process of scaling. This talk is a journey through that growth period and discusses the variety of processes and practices that were applied in order to stay "Agile".
We'll touch on the various approaches taken including topics such as Scrum, Lean, Kanban, GIST planning, Lean Startup, DevOps, IaC, Azure, Story Mapping, BDD, TDD, Project vs Product and we'll be looking at the successes, failures and lessons learned.
We'll also look at how concepts from Team Topologies were applied, what team structures and interaction models were chosen and how that shifted the team dynamic. We will discuss what, in Rich's opinion, has worked well and what hasn't worked well during that time.
At very least you will walk away from this session with a long list of books to read and hopefully some insights into whether you might want to try some of the approaches discussed during the talk.
From four to forty in four years - lessons from growing a team
1. From four to forty in four years
Richard Allen
Conjurer Solutions Ltd
richard.allen@conjurersolutions.co.uk
@rich_allen
2. Who am I?
Owner
FounderHead of Consulting
Richard Allen
richard.allen@conjurersolutions.co.u
k
@rich_allen
devsouthcoast.comconjurersolutions.co.uk
leavewizard.com
The Beach
Father
25. Story so far…
• Modern tech stack
• CI
• Prioritised backlog
• Scrum, planning poker, story points
• 2 week sprints
• Demo to business
• Retrospectives
• GitFlow
29. “We should work on our process, not the
outcome of our processes” –
W. Edwards Deming
30. Decide what you
want to achieve
Go! Notice
what’s getting
in the way
Remove the
biggest
impediment
Henrik Kniberg
Process for Continuous Improvement
31. Too many cooks
Image courtesy of https://www.daviddorman.com/post/2019/02/03/too-many-chefs
32. GitFlow - Issues
Merge
performed by QA
Engineer!
Encouraged long running feature branches
Aligned well with
sprints but resulted
in merge hell
33. Batch Flow
To Do In Progress Done Release 4 Ship every 2 weeks
Release 3
Release 2
Release 1
DeployedAdd to release batch
What caused the problem?
37. The 3 Ways: DevOps Principles
1st Way: Systems Thinking and Flow
2nd Way: Amplify Feedback Loops
3rd Way: Culture of continual experimentation and learning
38. Improve Flow & Feedback
To Do In Progress Done Deploy Ship when ready Deployed
* Goal is to be as close to single piece flow as possible
Developer responsible for end to
end delivery into production
43. Encourage flow in daily stand ups
Up Next In Dev (5) Pre Test Review Ready for Test (3) In Test(3) Final Review Ready to Deploy
Walk the board from right to left
Closest to
delivering
value
WIP Limit
broken – is
there an
issue here?
Blocked
What can
we do to
unblock
this?
52. Types of work
Projects – Greater than 4 weeks work, requires a kick-off and retrospective
Bugs – any bugs in the current production system
GSD – infrastructure, monitoring, logging and developer experience
Small Changes – feature requests less than 1 weeks work
57. More work to do = more project teams
Project D
JA
UX
KE
QA
DG
CTO
JK
Dev
Small Change
MD
Dev
GSD
BL
Dev
BugsProject E
TM
QA
DM
Dev
MB
Dev
AW
Dev
RA
Dev
JM
Dev
58. Things were not quite right with projects
Project D
JA
UX
KE
QA
DG
CTO
JK
Dev
Small Change
MD
Dev
GSD
BL
Dev
BugsProject E Project F
TM
QA
DM
Dev
MB
Dev
AW
Dev
RA
Dev
JM
Dev
?
59. Project
feedback so
far…
Lack of business engagement
Cross-team collaboration was lacking
Different goals between Marketing,
Development, Commercial Finance and
UX
What was built wasn’t necessarily what
was required
62. Project F == Conversion Rate Optimization (CRO)
• Improve Join Conversion by 5%
• Increase uptake of Extra in Join to 6%
• Increase uptake of Multi in the join process
• Improve upgrades to Extra
• Form a cross-functional team consisting of:
• Marketing, Development, Commercial Finance and User Experience
63. Define a 90 goal….
What do we want to achieve in the next 90 days?
• Improve Join Conversion by 5%
How can we build the right thing?
70. What is the next problem to solve?
• How has our understanding changed?
• What do we know now that we didn’t
before?
• What can we do next to add more value?
74. Small Change, Bugs & GSD Siloes
Project G
JA
UX
KE
QA
DG
CTO
JK
Dev
Small Change
MD
Dev
GSDBugsProject H BI
TM
QA
DM
Dev
MB
Dev
AW
Dev
RA
Dev
JM
Dev
BL
Dev
QA
75. Small Change, Bugs & GSD Siloes
Small Change GSDBugs
AW
Dev
RA
Dev
BL
Dev
Acquisition
Retention
Gym Team
Member Services
Unplanned
Work
Neglected
Work
Unknown
Dependency
Conflicting
Priorities
Too much
WIP
76. Initially consolidated into a single delivery team
JA
UX
Small Change
GSD
Bugs
MN
QA
GH
Dev
RA
Dev
BL
Dev
Acquisition
Retention
Gym Team
Member Services
78. Overall Team Structure
Retention
Gym Team
Commercial Finance
Project X
Project Y
Project Z
Projects > 4 weeks
Small Change
Bugs
2 – 4 days
Handover after projects finished
Delivery Team (BAU)
GSD
79. Devil in the detail
Backlog
Should be items 2-4 days
80. Devil in the detail
Backlog
Should be items 2-4 days
Turned out to be 6 weeks work, but had to done!!
Had been engagement with external agencies
Full design documents had been completed
Class Page re-design
81. How did we miss it?
Backlog
Turned out to be 6 weeks work, but had to done!!
Had been engagement with external agencies
Full design documents had been completed
Class Page re-design
This work was not Visible!
Unknown
Dependency
91. ICE - Confidence
10 - Lots of launch data
9 - Some launch data
8 - A little bit of launch data
7 - Test results - longitudinal user study
6 - Test results - large scale MVP
5 - Test results - A/B tests
4 - User evidence - lots of product data
3 - User evidence - top user requests
2 - User evidence - 20+ interviews
1 - User evidence - Usability study
0.75 - Market Data - Surveys, smoke tests
0.5 - Anecdotal evidence - a few product data points
0.4 - Estimates & plans
0.3 - Other's opinion - external expert
0.2 - Thematic support - vision/strategy/trends
0.1 - Thematic support - outside research
0.03 - Pitch deck
0.01 - Self conviction
92. Idea – Live chat will reduce calls to Member Services
Impact x Confidence x Ease = ICE Score
5 x 0.5 - Anecdotal x 8 = 20
Idea – Improve member dashboard will reduce calls to
member services
2 x 4 – User Evidence x 7 = 56
93. Linking between boards
Requires Impact
Analysis
Do Next In Delivery Done
Requires
Discovery
In Discovery
Idea board
Up Next In Dev In Test Ready to DeployBlocked In Discovery
Delivery board
94. Delivery Team Process
• Focus on delivery 1 goal at a time
• Identify key stakeholders
• ICE Scores on Ideas
Idea Prioritisation
Week 1 Week 2 Week 3
Focus on Acquisition
Idea Prioritisation
Step Review /
Prioritisation
Step Review /
Prioritisation
Acquisition Retention
Gym Team Member Services
Focus on Retention
All stakeholders involved in prioritization
(Idea board) (Idea board)
(Delivery board) (Delivery board)
97. Team Structure – about 20 people
Acquisition Retention
USA Team
Commercial Finance
Project X
Project Y
Revenue
Project Z
Member Services Gym Team
Delivery
Team
GSD
98. Team Structure – about 20 people
Delivery
Team
Acquisition Retention
USA Team
Commercial Finance
Project X
MOBILE
Revenue
Project Z
Member Services Gym Team
GSD
99. Hire mobile developers
Image courtesy of:https://www.elinsys.com/blog/hiring-right-mobile-app-development-resources-makes-difference-elinsys-blog/
Double the team size as fast as possible!
107. Metrics of high performing teams
• Deployment frequency
• Lead Time for Changes
• Mean Time To Recover (MTTR)
• Change Failure Rate
108. Don’t just grow mobile
Delivery
Team
Acquisition Retention
USA Team
Commercial Finance
Project X
MOBILE
Revenue
Project Z
Member Services Gym Team
GSD Team
118. So how do we change the team structure?
Retention
Gym Team
Commercial Finance
Project X
Project Y
Project Z
Delivery Team
GSD
119. The 1st Way: DevOps Principles
1st Way: Systems Thinking and Flow
120. A bath tub is a system
Water flows out
Maintain a constant “Stock” of water
Water flows in
As water flows out
adjust the input to
re-fill
121. A bath tub is a system
• A bath tub is a system
• A system is made up of inputs, stock (WIP) and outputs
• Feedback loops help a system to adapt to change (re-enforcing, stabilising)
• Feedback loops can help identify bottlenecks
• Feedback loops can help visualise WIP (Stock)
• Feedback loops can help encourage conversation
• Feedback loops can help encourage collaboration
126. Conways Law
“Organisations which design systems…are
constrained to produce designs which are
copies of the communication structures of
these organisations”
Mel Conway, 1968
127. Reverse Conway Maneuver
“Organisations should evolve their
team and organizational structure
to achieve the desired architecture.
The goal is for your architecture to
support the ability of teams to get
their work done…without high
bandwidth communication
between teams”
128. Not Optimized for Flow of Change
• Short-lived project teams
• Project teams add new features
• BAU team handles bugs
• GSD handled by a separate team
• Risks handled by a separate team
• No ownership of code
Projects Delivery
129. Optimize for Flow of Change
• Long living stream aligned teams
• Handle new features, bugs, GSD and risks
• Ownership of code
132. Litmus Test for
Fracture Plane
“Does the resulting architecture support
more autonomous teams (less dependent
teams) with reduced cognitive load?”
134. New Team Structure
Acquisition Retention Payments Gym
Exerp Gateway Developer Experience
Mobile Small Change Mobile Digital Coach
Stream Aligned Teams
Enabling Teams
System Reliability
This talk is a story about the Agile journey that the team at PureGym have been on over a period of 4 years.
We’ll touch on how it all started, discuss some of the processes are practices implemented.
Cover some of the challenges encountered and describe how the team adapted and changed over time.
Feel free to play buzz word bingo throughout the presentation and call out “house” at any time – although you may need to use the “raise your hand” button in the Zoom chat.
PureGym is the largest gym chain in the UK and one of the fastest-growing UK startups in the fitness industry <fact check>.
Growing from 70 gyms to over 250 in the UK within the last 4 years, the recent purchase of Fitness World in Denmark and expansion into the US market
PureGym places a large emphasis on the use of technology to enhance the member experience on their mission to make the world a fitter place.
Until the recent turn of events with the COVID-19 PureGym was set to become a major player in the global fitness industry.
In 2014, everything at PureGym was outsourced including the entire IT function.
The company was seeing good growth and they were investing in technology but their outsourced supplier began taking longer and longer to deliver new features.
This culminated in an outage that lasted a couple of days that occurred in the first week of January 2015 that meant members could not sign up.
The fitness industry is a very seasonal business and any missed revenue during the peak periods can significantly affect the year-end results.
PureGym’s business model is to have low cost, high-quality tech-enabled gyms and to achieve that the only way members can join the gym is through their website.
If the website is not available, then members cannot join and this can result in significant lost revenue.
In June 2015 after the outage, Humphrey Cobbold, CEO, hired Daniel Glyde, CTO, and challenged him to set up an internal development team, internalize the IT function and bring the website development in-house to avoid costly future outages with a goal of moving away from their outsourced supplier by May 2016.
Daniel accepted the challenge and set about building a team.
Some of his first hires included John Kilmister (Developer), Kieran Edwards (Tester), Justin Amphlett (UX Designer) and Myself (Developer).
So the goal was to re-platform the existing website and members area
The core tech stack chosen was Umbraco CMS running in Azure with React .NET for Javascript.
Click: At the time Azure was still on the “Old” portal
Click: And React .NET was a bit of a punt as it was fairly new tech back in the day
For our continuous integration pipeline we settled on BitBucket for our code repository, Team City for builds and Octopus for deployments
With Jira (for it’s sins) for managing tickets and Confluence to capture documentation – there was discussion around using Sharepoint at one point but that idea got quickly shutdown
Alpha for development and test
Beta as like for like production
Production – customer facing
Gamma – load testing
We put the site behind Cloudflare and configured Traffic Manager in Azure with North Europe as the active region and West Europe as the secondary region
Performed Beta fail over once per month
Performed Production failover once every 3 months
We had a clear backlog (the entire existing website) and a self imposed tight deadline so we set about scaling up to 16 people
Mean while we began putting in place the CI pipelines and defining basic processes and practices so that we can try and hit the ground running when the new people arrived
We wanted to follow an Agile delivery process, I’ve included this famous diagram from Henrik Kniberg for those that haven’t seen it.
Because we wanted to follow an Agile process our instinct was to deliver value early and often, but because we were porting the existing website we couldn’t go live until there was sufficient functionality in the new website.
Therefore the best we could do was deliver to a UAT environment and do a phased roll out using Cloudflare to manage amount of traffic
When we started looking at the existing site in closer detail it became apparent that there were general UX and performance issues which led us to ask whether the business really wanted a “complete re-build” of the site.
The answer came back as NO, so we needed to work quickly to identify all of the features and get the business to prioritise these.
So we used story mapping to define the actors, activities, tasks and stories required to deliver the features.
Then we planned each release with the focus being on building a “potentially shippable product” at the end of each spint.
At the time we used physical boards and the user story map took up the whole of the back wall of the office.
At the time we chose to deliver the project using Scrum following two week sprints.
Why?
Structured process
People had most experience following it
Clearly defined ceremonies
A well understood backlog to be delivered
So we created two, two-pizza sized teams.
One scrum master, one business analyst
We attempted to use planning poker with story points to get a general feel for velocity and whether we were on track to hit the target deadlline
We used burn down charts to track progress and forecast whether we were going to hit the deadline
We performed demos and retrospectives at the end of each sprint
Due to the tight deadline we took the purposeful decision to create a monolith application
We adopted the GitFlow branching strategy.
Multiple feature branches were created from the develop branch, which was then merged into the release branch and eventually merged into the master branch and released to production.
Merges into master were performed at the end of the sprint by the QAs
Our deployment frequency into “faux” production was once every 2 weeks.
With a phased rollover from the existing website using Cloudflare page rules at the end of the project.
We managed to hit the target deadline, on time and to budget.
We got buy in from the rest of the business that having an internal team was the right way to go.
At the end of the initial project the team scaled back to 8 people
It was now a time for reflection
In the words of Willian Edwards Deming “We should work on our process, not the outcome of our processes”
For those that don’t know Deming is widely acknowledged as the leading management thinker in the field of quality.
He was a statistician and business consultant whose methods helped hasten Japan's recovery after the Second World War and beyond.
He was at least partly responsible for Toyota’s gleaming quality reputation due to the introduction of the Total Quality Management process – which is also known as “lean manufacturing” with a focus on eliminating waste in a process and focuses on continual improvement.
The essence of continuous improvement is as follows:
Decide what you want to achieve
Go and notice what is getting in the way
And then do what needs to be done to remove the biggest impediment
So what happened when we looked back at how the project went?
First of all, hiring lots of high quality developers is a great idea but at the time no-one really had a clear definition of what the ultimate deliverable would look like.
Each developer had good experience and valid war-stories about specific ways of doing things and could back up their arguments with evidence that may have been contradictory to another persons opinion.
This resulted in a lot of “churn” and “heated” discussions around what particular practices and processes should have been followed.
We felt there were potential issues with the GitFlow process.
Click: First of the releases were performed by QA which meant they were an instant bottleneck
Click: Secondly, the flow seemed to encourage long running feature branches which ultimately resulted in more Work In Progress
Click: And thirdly, we felt that although the process was well aligned to sprint based delivery it very often resulted in merge hell
There were also other associated issues with the sprint release process.
Essentially we were using a batch flow which meant that two weeks worth of work was being batched up and merged at the end of the sprint
Click: After the code was deployed into production if any bugs were encountered it became much more difficult to identify which particular commit caused the issue and the rollback strategy was certainly more challenging so our ability to recover from a potential issue was hampered.
Ultimately, we found that the Scrum process forced and arbitrary deadline which resulted in artificial preparing for the “end of the sprint” and then lots of time planning the next sprint, calculating story points etc
So it was time for a change
We were a bunch of developers who owned all of the infrastructure and configured it ourselves
Using Azure allowed us to do this but we didn’t really get what DevOps really meant.
Click: So Devs simply doing Ops is not equal to DevOps
The DevOps handbook provides a good overview and general DevOps principles, practices and processes
DevOps builds on the already good practices of software development and aims to bring it to the whole organisation.
We don’t have time to go into a deep dive on DevOps but there a 3 key principles that you should take away.
Click: The first is systems thinking and flow – how do we get business value to the customer in the most efficient way
Click: The second is amplify feedback loops – how can we make our work visible and shorten the time it takes to take action
Click: The third is a culture of continual learning and experimentation which aims to promote a safe, blameless working environment that and encourages collaboration across the entire organisation
With this in mind, here are some of the changes we made…
How can we reduce the time it takes to deliver value to the customer?
Instead of waiting to put the item in to a batch or sprint release, deploy a single small piece of functionality at a time – when it is ready ship it.
This reduces the time it takes to recover from an error – so when we ask the question what caused the problem, we know it was that ticket and we can have a much simplified rollback strategy?
Click: It also means that a developer can take ownership of the entire ticket from development to production and monitoring and which reduces the feedback loop from an error that occurs with their ticket
To help achieve this improve flow model we implemented Blue/Green deployments using staging slots in Azure App Services
We built a feature flag mechanism that allowed us to decouple our deployments from our releases – the code could be deployed into production behind a feature flag and then released by turning on the feature flag at the required time.
This also allowed us to perform incremental roll outs of features and test functionality on a per gym basis.
Introducing single piece flow meant we reduced our deployment frequency from once every 2 weeks to 8 times a day.
At this point our build and deploy process took about 1 hour and included all unit and UI regression tests
We also moved to a simplified branching strategy which is much closer aligned to Trunk Based Development and encouraged shorter term feature branches
When it comes to daily stand ups we want to encourage flow through the board as much as possible
Click: So to do that we walk the board from right to left
Click: The things on the right are the closest to delivering value
Click: Use WIP limits to highlight bottlenecks and encourage feedback of potential issues – are the testers struggling with something?
Click: Include a blocked column and make this the last thing we talk about so that we look to unblock these as soon as possible
One of the major challenges of delivering any project to a specific deadline is the accumulation of tech.
Tech debt is essentially the “cruft” that makes a system harder to understand or reason about and the interest payments are the extra effort that these changes require.
One of the biggest tech debt issues we encountered was around the introduction of BDD and SpecFlow.
Unfortunately, not everybody on the team had sufficient time or training to get up to speed on the best practices and processes when using SpecFlow and BDD.
This resulted in the test framework very difficult to reason about and keep everything in your head, so we took some time to understand where the issues were and created specific approach using a combination of Contexts, Mocks and Fakes – this might be the subject of a future talk.
We introduced 20% time which allowed each of the developers to spend time working on something that would improve the process or provide a new feature for the business that was not currently aligned with what they were working on.
This allowed some of the devs some time and space to address those tech debt issues.
Some other things that came out of the innovation time include (but not limited to):
Hello, I can help with the following commands:
* PG-[ticketNumber] (e.g PG-2345) - I'll provide you with information on any Jira ticket id
* [searchTerm] gif (e.g. cat gif) - I'll find a random animated gif for you of your choosing
* What gym is id [gymId] (e.g. what gym is id 3) - I'll find out the gym id for you
* What id is gym [gymName] (e.g. what id is gym London Aldgate) - I'll find out the gym name for you
I can also create test members using the following command:
Create a|an [memberType] member in gym [idOrGymName] starting [yyyy-mm-dd]
--
* Note that including a starting date is optional.
--
Choices for member types include:
* daypass
* 9 month pif
* 12 month pif
* standard
* off peak
* cluster
* multi
* national
* extra
* extra national
The deploy monitor was build to visualize the deployment process from code check-in, build and deploy. It combined the data from BitBucket team city and Octopus and was displayed on TVs on the wall.
It was also enhanced with Text to Speech and would announce when an action needed to be taken: “Richard – Manual intervention is required on feature PG-4763”. Nick named “Holly”, she also provided insights and comments when code got released like “Good job man”.
Another invaluable tool was the visualisation of the requests made to our third party membership management provider.
This would be on another TV and when we saw patterns like the above we knew their system was down.
In the early days we had some interesting conversations with them about whether the system was actually down or not – eventually they would believe us, restart a node and things would return to normal.
In order to share knowledge we would perform technical show cases and record them to ensure that knowledge was captured and shared amongst the team.
Project – Anything greater than 4 weeks work and typically these would be about 3 months of work (shield project members from distraction/impact of bugs)
Small Change – changes between 2 – 4 days, small improvements
Bugs – Production issues
GSD – Getting Stuff Done – infrastructure updates, process improvements, upgrades
QA and UX shared across each of the projects, small change and bugs
To begin with we attempted one person per project, one on small change, one on bugs and one on GSD
1 person per project meant that work was siloed, everyone had their own goals and deadlines and didn’t have “time” to help other people
So the next project we said that we should have a minimum of 2 people per project
Each project should have a kick off with key stakeholders
And at the end of the project their should be a review
In order to support the BAU work we would have one person for small
In order get more projects done we hired more people and created more project teams
But we kept the BAU structure of one person per Small Change, Bugs and GSD
After delivering a couple of projects we found that there were a couple of common issues…
We tried an experiment with one of the projects and attempted to apply some of the principles from these books
Marketing adopted the Kanban process and used in other areas of the team for digital activity
Some things that would have a detrimental effect were no developed
Process wasn’t really adopted outside of that team by other project teams
The next book I want to introduce to you is Making Work Visible – Exposing Time Theft to Optimize Flow by Domenica GeGrandis
Key concepts include the 5 time thieves:
Unplanned work
Neglected work
Unknown dependencies
Conflicting priorities
Too much WIP
We were finding that the people assigned to the BAU streams of small change, bugs and GSD were suffering from a number of these issues.
Neglected work – Tech Debt accumulated in projects due to tight project deadlines, poor project handovers
Constant context switching to different areas of the code - not able to keep all the information in their head
Conflicting priorities – each stakeholder had different goals with no clear direction on priorities
No desire to collaborate across teams – different KPIs, Bugs and GSD “owned” by dev, small change owned by the business
Different stakeholders and deliverables
Lack of shared understanding
This all resulted in poor morale amongst the team and people did not look forward to being in one of the siloes for 3 months at a time
To address this we considered creating team called the delivery team which would handle Small Changes, Bugs and GSD
However, we quickly realised that the goals of the GSD team and the goals of the business facing team were not aligned in a way that produced high quality meetings
So we created a smaller focused delivery team that would serve the business and help deliver both small changes and bugs – these could now be prioritised amongst each other.
This team should also have a dedicated QA
We also decided that GSD needed more than one person as there was just simply too much to do.
So the team structure was a number of short lived project teams
Click: which would handover to the delivery team
Click: and a GSD team which would own the GSD backlog and implement the non-functional aspects
Things were going generally okay until something happened.
Each item in the small change queue should be 2-4 days work
However, we came across a piece of work hiding in the detail
It turns out that we weren’t visualising the upstream process outside of the team
When we look at the current delivery team Kanban board, what’s missing?
Well, if we look back at the Side-by-side discovery and delivery process we can see that we are not making this work visible anywhere.
Therefore there is no shared understanding of work that is flowing to us.
So we added an additional column
But we didn’t currently have a way of prioritizing or planning things that should be discovered so we investigated the introduction of a new process known as GIST
GIST Stands for Goals, Ideas, Step-Projects and Tasks
Goals are typically set on an annual basis and aligned with what the business wants to achieve
Ideas are a series of items that might move us towards that Goal
Step projects are small pieces of work
ICE score also referred to as Weighted Shorted Job First
So now we have two boards that we can link the tickets between and we can perform high level business level planning using the Idea board and lower level delivery planning on the delivery board
Focus on a single stake holder for a fixed period of time based on the owner of the “Idea” containing small changes – bugs related to the Idea could also be prioritised against new features
As a process for the delivery team it worked well
Having a much more structured approach with a team that is able to share the workload worked well
Unfortunately, the general approach, didn’t gain traction beyond the delivery team and was known as the Delivery Team process for a long time
The next section takes concepts from the following books:
So at this point in time we were structure like this:
A series of short lived project teams, a BAU Delivery Team and a GSD team
Then the next big idea came in from management – we need to focus on mobile…
So obviously the thing to do is hire mobile developers and double the team size as soon as possible and the hiring process began!
Until we said stop! And asked the question “How do we know that we need more mobile developers?”
If we are going to double our team size can our current processes that work for 20 people also work for 40?
Theory of Constraints, which is introduced in Goldratts book the Goal – and re-imagined in The Phoenix Project, states that at any point in time there will be a weak link in the chain and this is the bottleneck.
We cannot deliver business value any faster than our current bottleneck allows, so there is no point in hiring more developers if we don’t also increase our capacity to deliver business value.
Otherwise stock will just build up which will increase Work In Progress and ultimately become wasteful.
Therefore we needed to work out what the bottleneck might be?
Our current bottleneck was deployment frequency, essentially our maximum throughput capacity was 8 times per day.
If we double our team size would our current capacity be sufficient?
If we exceed our delivery capacity we will just end up with work piling up waiting to be delivered – which is wasteful
According “Accelerate – building and scaling high performing technology organisations”, if we are a high performing organisation (which is what we want to be) the number of deploys we do per day should increase at significantly increasing frequency as we add more employees
Was this going to be the case?
Accelerate also introduces a number of other metrics such as Lead Time for Changes, Mean Time to Recover and Change Failure Rate which we were aware of but didn’t have the capacity in our existing team structure to capture and then make decisions based on these metrics.
So instead of just growing the mobile team….
We also sought to grow the GSD team and give it the capacity it needed in order to capture, analysis and provide guidance on these metrics.
After some initial analysis the team identified that we would need to double our current deployment capacity to 16 deploys a day.
And given our current process we could achieve this by optimising current build and deploy pipeline ensuring that deployments too no more than 30 mins.
So the GSD team went through their own Goal based planning, prioritisation and delivery of incremental improvements that would move us towards the desired capacity whilst additional developers for mobile.
So we began the scale up to forty people whilst the GSD team worked franticly to improve out current throughput capacity.
However, it soon became apparent that in order to truly increase our deployment capacity and overall throughput capacity we needed to address the elephant in the room…the monolith!
At this point we put together a team to analyse various different approaches we could take to breaking down the monolith.
We had a number of workshops and tried to involve all members of the team.
The main outcome of this review was that we wanted to move to more or a micro-service architectural model however our current team and process structure did not lend itself to it.
Nobody really “owned” the code and the cross cutting project teams created dependencies and conflicts between each other.
Also, nobody really owned the architecture so it was decided that the GSD should play a stakeholder type role in each of the projects.
We experimented with our first potential micro service by implementing a feature called Online Cancel.
The ultimate goal of this project would enable members to have a cancellation experience similar to that of Audible where the member is offered incentives to stay i.e. if you are moving house, did you know there is a PureGym 5 minutes from your new house.
Click: We were looking to introduce an independently deployable microservice development using Gatsby.js
Unfortunately the project got stopped before it could be fully completed but it highlighted the way the project teams were structured meant that teams and/or stakeholders could not commit to longer term architectural changes due to the volatile nature of the project teams.
The following section takes concepts from these books:
Scaling Lean by Ash Maurya
Thinking in Systems by Donella H Meadows
Team Topologies by Matthew Skelton and Manuel Pais
Project to Product by Mik Kersten
How do we move away from short-term project teams to longer term product teams?
If you recall our ways of DevOps, the first way was “Systems Thinking and Flow”.
The book thinking in Systems is a really good insight in to this way of thinking and in it talks about how a bath tub is a simple system.
Water comes in through the tap, if there is no plug, the water flows out through the plug hole and if the flow in and out match via a feedback loop the “stock” of water in the bath will remain at a constant level.
If we increase the input without increasing the output the we will end up with more “stock” also known as Work in Progress
If we consider that PureGym is also system, it is therefore also a bath tub
In Ash Maurya’s book scaling lean he talks about the concept of Customer Factory which is built around Dave McClure’s pirate metrics Acquisition, Activation, Retention, Revenue and Referral.
The PureGym system acquires new customers, activates them when they enter the gym, retain them by keeping them coming back to the gym, generates revenue by charging a monthly subscription and gains referrals by delivery high value at low cost and encouraging members to talk and spread the word on social media.
So how can we organise our system in a way that optimizes for flow?
This is where we refer to the Team Topologies book, although it is not prescriptive it suggests a different way of thinking and planning your organisations team structure in order to specifically optimize for flow of business value.
The core reference is to Conways law which states.
The reverse Conway maneuver, as referred to in the Accelerate book suggests we can make use of that law in order to organise the teams based upon the architecture that we want to achieve.
So instead of our existing team structure where we do a lot of handovers
We should aim to optimize the teams for efficient flow.
Around 5 people – limit of people with whom we can hold a close personal relationship
Around 15 people – limit of people with whom we can experience deep trust
Around 50 people – limit of people with whom we can have mutual trust
Around 150 people – limit of people whose capabilities we can remember
Scaling Teams Using Dunbar’s Number Organizational groupings should follow Dunbar’s number, beginning with around five people (or eight for software teams), then increasing to around fifteen people, then fifty, then 150, then 500, and so on.
The project and delivery teams are gone.
Each team owns a list of applications or application components
They are responsible for features, maintenance and bugs
Each team has a Tech Lead and QA
Aligned to business unit with dedicated product owner or stakeholder
Individual business and architectural roadmaps
Cross cutting projects are a shared responsibility
I’m sure the evolution of the team structure will continue, this initial team structure may well evolve further based on the patterns and practices I have mentioned today.
If I was to start growing a team again today I would definitely start