- Layer 7 load balancing (L7 LB) is needed for modern applications as architectures and users have changed from monolithic to microservices and APIs, and from only humans to humans, systems, and things.
- Scaling modern applications requires choosing an architecture like horizontal duplication, functional decomposition, or data partitioning along the X, Y, and Z axes rather than just choosing a load balancing algorithm.
- Architecting for scalability involves variations of L7 LB techniques like data partitioning, URL/HTTP dispatch, dynamic routing, functional decomposition, and API metering and versioning. Key considerations for scaling include API design, user/entity identification, API versioning, and monitoring.
4. 1 Minute of Downtime
Data: Emerson Power
Costs an average of $7300
Average total cost of downtime per year across industries
PRODUCTIVITY IT PRODUCTIVITY LOST REVENUE
$53,608$140,543 $183,724
9. And so are “users”
THEN
Humans
NOW
Humans, Systems, Things
10. L7 LB
Layer 7 Load Balancing
Modern Apps Need Modern Scale
11. Scaling Modern Apps
THEN
CHOOSING AN
ALGORITHM
NOW
• Round Robin
• Least Connections
• Fastest Response
CHOOSING AN
ARCHITECTURE
• Horizontal
• Functional
• Partitioning
19. • Data Partitioning (Sharding)
Architectures
• URL/HTTP dispatch
• Dynamic routing based on backend
data
• Scaling by Functional Decomposition
• API metering
• App versioning / migration
• API deprecation
L7 LB
Architecting
Scalability
20. Things to Consider Sooner Rather than Later
Is your API design well-suited to scaling in any way other than POLB?
How do you identify people, systems, and things?
How do you distinguish between API versions?
Multi-tenant or single tenant microservices?
The answers to these questions will impact your scaling architectural options*.
Emacs or vi?
*Maybe not that last one.
How are you monitoring (and what are you measuring with it)?
Performance and availability. Business. Engagement. If we don’t scale, the business doesn’t grow, we impede productivity. We lose.
We don’t just scale today for technical reasons. Availability and its related metric of performance are tied to business metrics today like conversion rates, abandonment, bounce. There are real dollars tied to these operational metrics. And because in many cases – regardless of industry – apps are the business, there are real business metrics tied to them, as well, such as productivity and revenue growth.
Three hundred million dollars was spent to lay a high-speed fiber optic cable from the futures market in Chicago to the exchanges in New Jersey to improve the speed of stock trading (particularly high- frequency trading — HFT) by 3 milliseconds. That’s 100 million dollars per millisecond. Most organizations work in bigger slices of time, like 100 milliseconds. A quick blink of your eye is 100 milliseconds.
Google considers page load latency in scoring ad relevance.
https://theeconomiccontrarian.wordpress.com/2014/04/01/what-is-worth-100-million-per-millisecond/
IT productivity is something we don’t often consider as a “thing” to be lost. But in the spring of 2000 I was a technical architect for a global transportation firm (they have big orange trucks ;-)) and we lost the primary application used to schedule and route shipments. For nearly 6 hours everyone not working to restore that system was diverted as runners between fax machines and the dispatch floor, trying to keep business running via sneaker net. That’s IT productivity lost right there, because I wasn’t coding – I was a messenger.
http://www.emersonnetworkpower.com/documentation/en-us/brands/liebert/infographics/documents/ponemon-infographic-cost%20of%20downtime-r11-13-final.pdf
We scale because
How do we scale (everything) today? Up or out.
How do we scale out? This is the monolithic approach.
We need the ability to route, direct, and distribute load based on modern architectures that include highly decomposed applications (microservices), multiple versions of the same app and APIs and an increasingly diverse set of devices and things accessing those applications.
That’s where L7 LB comes in. It has the visibility necessary and the location in an architecture (up front) to make the kinds of decisions required to scale modern, distributed architectures.
And that’s exactly what we need – architectures. Scalability today is not just about throwing an LB in front of that API or service. It’s not even about how it fits operationally into the architecture. Every LB out there can be shoved into a VM or container and automated using APIs or scripts today. That’s the easy part. The harder part is about how to enable the deployment of scalable architectures – either by being flexible enough to route in a way that’s needed or by being a part of the architecture itself.
You want to be able to scale like Pinterest: from one little MySQL server to 180 Web Engines, 240 API Engines, 88 MySQL DBs (cc2.8xlarge) + 1 slave each, 110 Redis Instances, and 200 Memcache Instances. The secret? Architecture.
Architecting for scale doesn’t mean just choosing stateless over stateful services. It doesn’t just mean choosing a data sharding strategy or whether to decompose by function or object. It means considering how the scale fits into the architecture what it can do for you to help when it’s needed most.
This is why you should think about architecting scale, not just deploying it. Complexity. Breaking up apps into microservices and APIs means we’re simplifying development but complexifying deployment. Operationally, we’re adding more and more components – like load balancing – to the mix.
Guys from Pinterest: ““Sharding makes schema design harder · Waiting too long makes the transition harder”
It’s important to think about the entire architecture – including the LB – before it’s deployed. How it scales and participates and even monitors is critical for keeping both technical and architectural debt down.
At the application tier, eBay segments different functions into separate application pools. Selling functionality is served by one set of application servers, bidding by another, search by yet another. In total, they organize roughly 16,000 application servers into 220 different pools. This allows them to scale each pool independently of one another, according to the demands and resource consumption of its function. It further allows them to isolate and rationalize resource dependencies - the selling pool only needs to talk to a relatively small subset of backend resources, for example.
API Façade: Interpolated in Network
The façade provides the level of abstraction necessary for microservices-based apps today. Because each microservice are likely changing on different schedules, it’s highly disruptive to update the client for each and every one – especially early on. By virtualizing the API using a façade in the network, the API (and thus the client) can remain the same. Significant changes to any of the composite APIs can be reflected in the API façade without impacting everything else.
Variations on a Theme
Now we’ve got to consider the rise of things and how they impact scalability architectures. How do we efficiently and effectively design an architecture that scales while taking into consideration performance and security for a whole bunch of different “things” and “users” at the same time?
Monitoring at the first tier can also provide insight into usage patterns of consumers and devices, which means better planning around upgrades, patches, etc… that may need to be accessed.
Monitoring at the purpose-driven layer gives you insight into specific usage patterns of the API/application, which can help give a better picture of how to segment future versions or new applications.
Monitoring at the last tier is all about speed and capacity; improving performance and ensuring availability. This is where algorithms and monitoring are most important, because both have a serious impact operationally and on the business.
You’re architecting scale rather than bolting it on to the front.
Well designed APIs are consistent and able to be parsed into their respective scalability domains.
Clients should provide some indication (in headers, in data, in URL) of what they are to assist in security and performance and scaling.
API versioning is a significant source of frustration. Are versions clearly identifiable such that upstream dispatching can assist in a more seamless upgrade/migration experience?
Tenancy of microservices has a big impact on the network and scalability. Decide early which way you’re going to go – and whether you’re going to leverage the network to do it.
Right answer is always vi.