7. Peeking Inside
the Monolith
Main Member-Facing
Web UI
Account
Customization
Member Store
Secure Messages
...etc.
Member Liability
Estimator SOAP
Service Component
Source System
1
Source System
2
Source System
3
… Source
System N
8. Peeking Inside
the Monolith
Main Member-Facing
Web UI
Account
Customization
Member Store
Secure Messages
...etc.
Member Liability
Estimator SOAP
Service Component
Source System
1
Source System
2
Source System
3
… Source
System N
9. Commit to the rewrite:
A new API that returns the same
results & supports “tiered”
networks
11. The Strangler
Pattern: An
Iterative Rewrite
Benefits of a rewrite with
reduced risk, faster time to
value
Does require investment in the
approach.
Strangler Fig Hollow Inside of Strangler Fig
17. Pass-through &
Log (in prod)
3rd Party Web UI
Monolith
Source
System 1
Source
System 2
Source
System 3
Project X
Collect Request/Response Data
3rd Party Web UI
1 week
19. Log Both Results
& Default
3rd Party Web UI
Monolith
Source
System 1
Source
System 2
Source
System 3
Project X
Collect Request/Response Data
for Both
Defaulting → No Risk of Bad Result
3rd Party Web UI
Monolith
Source
System 1
Source
System 2
Source
System 3
Project X
Calculation Module
2 weeks
24. Starting to strangle stable cases
Started to turn
off path to old
system for some
cases
Web App UI
Monolith
Project X
Calculation Module
Source
System 1
Source
System 2
Source
System 3
5 weeks
27. Shut down the
Legacy
Calculation Path
Project X
Only call into our new calculation
module
We’ve now strangled a large part
of the monolith!
3rd Party Web UI
Source
System 1
Source
System 2
Source
System 3
Project X
Calculation Module
13 weeks
33. Options
1. Rewrite from scratch
2. Buy off the shelf
3. Do nothing
4. Containerize
5. Strangler Pattern
34. Build vs Buy →
Build & Buy
Is it core to your business?
Somewhere you want to differentiate?
Will the buy option require a lot of customization--
building logic into the system?
Often, the best option is both: Build the differentiating
parts, “buy” commodity components (eg don’t build
your own SendGrid, don’t build Stripe, don’t build your
own cloud platform).
35. When to ‘Do
nothing’?
● No delivery pressures
● Low strategic importance
● Stable enough if not touched
● Opex costs under control
37. When should you rewrite?
Maturity/Traction of product
● Original product was way off the mark, didn’t
achieve goals (eg no user adoption).
38. When should you rewrite?
Maturity/Traction of product
● Original product was way off the mark, didn’t
achieve goals (eg no user adoption).
● Original product does not have traction
39. When should you rewrite?
Maturity/Traction of product
● Original product was way off the mark, didn’t
achieve goals (eg no user adoption).
● Original product does not have traction
● Significant deviation from original intent of product,
going after a new market
40. When should you rewrite?
Maturity/Traction of product
● Original product was way off the mark, didn’t
achieve goals (eg no user adoption).
● Original product does not have traction
● Significant deviation from original intent of product,
going after a new market
● Technology holding you back (Mainframe, Visual
Basic overly-customized SFDC or AEM)
41. When should you rewrite?
Maturity/Traction of product
● Original product was way off the mark, didn’t
achieve goals (eg no user adoption).
● Original product does not have traction
● Significant deviation from original intent of product,
going after a new market
● Technology holding you back (Mainframe, Visual
Basic overly-customized SFDC or AEM)
● You can redefine the business process around the
new system.
42. When to use the
Strangler Pattern
● Well established product with significant user
base
● A significant risk to revenue streams
● Lots of necessary complexity in your existing
product (eg complex regulatory compliance rules)
● You don’t know the business rules in the existing
system
43. Learnings/Takeaways
● This sounds technical but don’t compromise User Centred Design
● An opportunity to remove complexity
● Get laser focused on what really matters 80:20
● Don’t rebuild like for like
● When rewriting take an iterative approach
44. How do you do this in your organization?
Start Small
Put together a business case around a subset of the capabilities that will deliver value over
a matter of months, not years. Frame it as a “no regrets” move with near term benefits.
Quantify Outcomes
Establish a baseline and measure against it (dev cycle time is good, but
cost/revenue/acquisition metrics are even better)
Use one win to build momentum for the next
By starting small, you can prove out the process and build support to keep going. Once you
have a first win, a technical foundation, and understanding of the system, you can “double
down” and scale the effort.
Hey everyone, my name is Simon Duffy. I’m currently a product manager at Pivotal Labs and have been working with enterprise clients and startups and back home in Australia for about 13 years of which I’ve been building products for about the last 8. I was the product manager helping build the new app that is the protagonist of the case study that we will be exploring today.
Low confidence
Reduced Velocity
David and I will be presenting a case study on how we approached a pretty common technical challenge in the enterprise today. That is, how to incrementally and with minimal risk perform a legacy monolith application re-write to take advantage of modern architecture and cloud infrastructure as well as building out some new business features.
We will provide an
overview of our product
the challenge we found ourselves in
how we pivoted to data driven delivery approach
steps on how we executed this
We’ll wrap with key learnings
I will primarily talk to product management related discussion points and David the implementation and engineering.
Let’s get into it. A quick raise of hands, how many of you in your careers have ever replaced a legacy system? Ok… so some of this might sound pretty familiar.
So in our scenario we were working with a large health insurance company tasked with extending the capability of a component of the monolith.
Our product would estimate the cost an individual would pay for medical procedures and the impact would have on your health insurance coverage. Example: if you need a knee replacement, here’s how much it will cost at each of the providers within a given location and here’s what it means for your health insurance annual deduction amount, coinsurance etc.
The accuracy of the estimate was very important. If the procedure estimate was $100, a member went to the doctor, and the bill ended up being $1500, then the insurance company would typically pay the difference to the member so they weren’t left out of pocket.
Existing app
Existing app
Existing app
For further context, the monolith programmers had long since left the company, there is no documentation and we are heavily reliant on a single expert who has worked with the system for years.
We had a level understanding of the monolith to know that change is super risky and continued investment in old technology doesn’t make sense. So we commit to a re-write to a cloud native app that enables us to incrementally build out our new required features.
I would work very closely with the expert and she would describe the behaviour of the system and surface the detailed logic of how the member cost liability was calculated for each provider. She would also describe how the various systems passed through information and what each of the various fields represented. Due to the complexity of the calculations we were extremely lucky to have a 20 year expert on our team otherwise it would have been almost impossible to start.
Myself and the expert would work together and break these calculations down into small user stories that our engineers would develop and then we would test the result of the calculation manually against the result from the monolith.
And this was working well as the team was ramping up. After a couple of months, we had the core happy-path flows through the app developed.
It was as we were unpacking the nuance of the alternate flows through the system that things started to get tricky.
Existing app
I recall on one day we were looking at a 6000 line xml response from a source system that included information about the relationship between a member and what insurance policy they had.
We had been using a particular field as a key element in defining that relationship and after troubleshooting some issues we realized that we were using that field incorrectly.
This was such a basic field. Gee...If we got this wrong….how many other things had we got wrong.
Over the proceeding weeks other similar scenarios started to surface that ultimately made us question our detailed understanding of the all the different calculation permutations.
So we are a few months into the re-write and our understanding of our product has deepened. A combination of unknown business rules, large lead times on QA and lack of robust testing was causing nervousness.
So one evening a few of us were sitting around and sharing with David our frustrations and started to think about how we could approach this differently.
8.30
Strangler is very well suited when there is a clear understanding of how to breakdown the monolith and you have high confidence in:
What the business logic is that is built in the new
That you can ensure MECE (mutually exclusive and collectively exhaustive) routing rules
We didn’t have either of these. Firstly, there were calculation complexities being newly discovered and secondly, due to these complexities we didn’t think we could ensure MECE routing. Sure - we could have spent the effort in meticulously analysing the monolith codebase to etch out all these scenarios and equally what the routing rules were.
But we didn’t want to do that, and so we thought about how we can build a system that will surface that information to us.
11 mins
This initial step provided a huge amount of value to understand the product we were building and gave us 2 quick wins.
Firstly, we gained detailed insight into the utilization of the monolith. We could clearly identify all the different types of requests for different calculation types. We were finding new scenarios that were not previously understood and we could sharpen our feature prioritization by addressing the calculations for high volume request types.
Secondly, We now had a vast set of test cases to leverage that represented real Production data. So less time and effort spent on test data prep.
Automation of comparing the results between systems was a big efficiency gain. We could now query our database for cases marked in error and further deep dive into those, rather than manually assessing all cases.
In effect, we were testing, at scale, in real time in Production without putting the results for the user at risk!
It also allowed us to produce aggregate metrics on our performance that was reflective of the value our team wanted to track.
What are we looking at here. This is a picture of a key metric that we were tracking during our delivery. We are tracking the percentage of matched cases on the vertical axis and time on the horizontal axis. There are numbers on each of the post-it notes that is the % match of cases that matched for a day.
Our goal was to move up and to the right. Ie, we would have a higher rate of calculation matches between the old and new over time.
As this metric was produced, it immediately became the metric that we all cared about. Velocity, volatility, story cycle time…. While these may be good reference points to assess our team’s performance, how much of that actually mattered if we weren’t delivering accurate calculation outcomes.
So, we had a better metric that was more reflective of the value we were delivering.
Our metric allowed us to engage in a really cool discussion with our stakeholders on the financial risk of deploying our app into Production.
We were able to put a dollar value on what we thought it would cost to deploy into Prod and turn off the monolith by applying the following formula….
After progressively strangling calculations of high confidence, we reached a point where our stats were telling us we were hitting about 98% accuracy between old and new. This was the threshold where the possible financial impact was close to the cost to maintain the legacy monolith component. We also knew that we were overstating the financial risk as were confident there were a few more false positives in there that hadn’t yet surfaced.
So we started to talk about completely shutting down the monolith calculation capability. Typically legacy system cutovers is a big deal. However, this turned out to be a pretty simple decision. Referencing our metric earlier we had a clear view on how to make the decision. Is it cheaper to accept the possible financial impact of paying out differences to members or maintain the monolith?
It was time to ‘shut it down’. By it, I’m referring to the member calculation component of the legacy monolith.
REFERENCE: WHAT ARE WE GOING TO TALK ABOUT
Example: Real Estate Valuation Reporting System
Built logic, core flows, etc.“Bought” PDF Generation Software.
PCF Example on platform side. AWS + Full-time employees to manage AWS, write chef scripts, etc. on each team. VS Focusing on differentiators: Your apps
Example of ETDB rewrite SFDC, No traction and Pivot from original idea