This is the story of how Microsoft moved from a 2 yearly deployment to deploying every 3 weeks for Azure DevOps and Visual Studio. These learnings have been replicated across the company.
12. Typical day at Microsoft
Data: Internal Microsoft engineering system activity, August 2018
12.4k
Pull Requests per day
67k
Git commits per day
78,000Deployments per day
146k
Builds per day
500m
Test executions per day
500k
Work items updated
per day
5m
Work items viewed per
day
Azure DevOps Services is the toolchain of choice for Microsoft engineering with over 90,000 internal users
https://aka.ms/DevOpsAtMicrosoft
54. Starting with what is most important/most pain, go from
there
Designing metrics is as hard as designing features
Baking it into the review culture – from top to bottom –
cadence is the heartbeat – spurs activity
In pursuit of that goal, in the last year alone, I have visited 46 customers in 38 cities & 14 different countries. And sometimes I still get surprised by some of the strange and wonderful thing that they do… its happening less and less…
Everyone has the same problems, faulty understandings, and dysfunctional behaviours… some are just better at working around it than others…
Visual Studio depends on Windows.
Windows depends on Visual Studio for compilers.
What happens when there’s a compiler bug?
Windows vendors the entire toolchain.
NOTE TO SILVERCLOUD: this image is licensed stock photography from shutterstock
Plan, learn, react to feedback
Quality
Chaos
Enterprise scale
A produce cycle looked something like this. It worked given the environment… but the environment has changed. We needed something different.
Today we look more like this. That is to say that our teams work in 3 week sprints… and we plan continuously. This is how we run the business.
Just move onto the next sprint, it’s righ tther
I can’t tell you there was a day we made a decision to be Agile… instead, a group said “Hey, this agile thing sounds interesting… we want to try that”. The decision I made was not stopping them from trying Agile.
This approach, aligns with what I was describing earlier… ALIGNED AUTONOMY.
We see alignment through the SCENARIOS and SEASONS… and we give teams automoy through their uses of STORIES and TASKS.
In fact, if you looked at my backlogs, and the backlogs of my teams… you’d see these exact terms.
Need more room for the email so people can see it
When we started planning the service, our initial thinking was more like a box product
Started by looking at major/minor updates
All updates were major!
Story: December 2011 update went very badly and took a week to complete.
Larger sets of changes are harder to test, diagnose
Risk is proportional to the ship cycle
Ship frequently and stay near ship quality
Our organizational chart is by discipline. PMs report to PMs. Devs report to Devs. Testers report to Testers.
Our organizational chart is by discipline. PMs report to PMs. Devs report to Devs. Testers report to Testers.
However, our business is managed through cross-discipline teams.
Fred Brooks coined this term…
Conway’s Law
“organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations”
Aka shipping the org chart
Hard requirements around release branches, while devs want freedom in topic branches
Starts with getting the branch workflows right in a single repo
Adding support for robust cross-repo sharing across release trains
Tell stories here.
Code complete… we celebrated this achievement. But what did we have? A lot of code… with a lot of bugs. No way to deliver it to customers. We’re
Test & Stabilization… how do you think the team felt? Morale? Bad. We’re now just climbing a mountain.
No, we pay our debt as we go.
Our new bug curve looks more like this. We don’t let it every grow out of control. This enables us to ship features when they’re ready… instead of only at these “big events”.
Availability and usage are hard to troubleshoot with all of this going on
and reinforce what this transformation made possible for our engineers and the opportunity for it to do the same in their orgs and the way they work with us as synergistic technologies
Find ways your methodologies/solutions could plug in
Synthetic tests – “test in production” used in the earliest days of the service; run broad functional tests against one test account; we left this behind pretty quickly
Command health – Aggregate availability number based on command pass/fail. This is a pretty standard model. It worked well for us when command volumes were relatively low. It lost sensitivity as command volumes grew.
Customer impact – The main message here is that we choose to create buckets to deal with the scale problem. We measure failed user minutes.
Pass/fail & performance grouped in time and aggregated to account then service
Dashed black line at the top is the Command Health (old) model – not sensitive
Solid black line shows Customer Impact
We now clearly see that there is a customer impact during this event even though failed/slow command numbers are small.
At a very high level… we’ve normalized so that every active account contributes an equal amount to the availability measure. This gives small accounts a voice.
Classified alerts and reduced noise: repeat, non actionable, and
One of our biggest acheivements was tuning our alerts accurately enough to autodial the DRI without the need for human escalation. We achieve this by having a health model to eliminate noisy, redundant alerts, and smart boundaries to indicate when action is actually needed, as shown in Figure 15. It has given us a 40x improvement in alert precision, to the extent that by February 2015, all P0 and P1 alerts were routed correctly by the autodialer.
Figure
Deployments don’t take much. Can redeploy, not roll back.
Optimize for Time to detect, time to mitigate not mean time to failure: redundancy in the system
Root casue analysis all of it
Revisiting telemetry and alerting to find earlier signs
By being built on Azure we get a lot of the failure/redundancy for free. Jeff’s whitepaper aka.ms.vsosecurity
We’d ask for feedback after each milestone – planning, Beta, etc.
The problem was, there was never time to react to any of it. For the most part, we’d tell people “sorry”… and push those things off to the next release. We’ll get to that in 2.5 years when the next release comes out.
We’d find bugs with the process… and fix them. No problems there. But we couldn’t react to anything our customers using the product were telling us… or very, very little.
We’ve now got channels for feedback continually. We still have the “big event” feedback at preview, etc. But we’ve got a channel to talk to customers… constantly.
In fact, to make that a bit more real… here are examples from our release notes that we write every 3 weeks with updates to our service. At each of these intervals we’ve got a chance to listen to customers and react.
We’ve included in our process some lightweight methods to stay connected. Every 3 sprints we sit down with teams for a “chat”. This is direction conversation with leadership talking about 3 things (next slide).
What’s next on your backlog?
Is debt under control?
Any issues?
Team chats are direct conversations with the leadership team. Every org has a layer in the “middle”… we’ve got that too although we’ve done a lot to flatten our orgs in recent years.
Instead of this… we do this (next slide).
That’s not to say that the folks in the middle aren’t involved – they are. But we talk directly with the team.