Part 2: What you should know about Elasticity, Scalability and Location Transparency in Reactive systems
In the second of three webinars with live Q/A, we look into how organizations with Reactive systems are able to adaptively scale in an elastic, infrastructure-efficient way, and the role that location transparency plays in distributed Reactive systems. Reactive Streams contributor and deputy CTO at Typesafe, Inc., Viktor Klang reviews what you should know about:
How Reactive systems enable near-linear scalability in order to increase performance proportionally to the allocation of resources, avoiding the constraints of bottlenecks or synchronization points within the system
How elasticity builds upon scalability in Reactive systems to automatically adjust the throughput of varying demand when resources are added or removed proportionally and dynamically at runtime.
The role of location transparency in distributed computing (in systems running on a single node or on a cluster) and how of decoupling runtime instances from their references can embrace network constraints like partial failure, network splits, dropped messages and more.
In the third and final webinar in the series with Jonas Bonér, we go over resiliency, failures vs errors, isolation (and containment), delegation and replication in Reactive systems.
5. 5
Yesterday Today
Single machines Clusters of machines
Single core processors Multicore processors
Expensive RAM Cheap RAM
Expensive disk Cheap disk
Slow networks Fast networks
Few concurrent users Lots of concurrent users
Small data sets Large data sets
Latency in seconds Latency in milliseconds
11. 15
“A service is said to be scalable if when we increase
the resources in a system, it results in increased
performance in a manner proportional to resources
added.”
- Werner Vogels
26. 30
Needs to be async and non-blocking
a l l t h e w a y d o w n
27. 31
Universal Scalability Law
«N is the number of users;
or the number of CPUs,
α is the contention level,
β the coherency latency.
C is the relative capacity»
32. The Role of Immutable State
• Great to represent facts
• Messages and Events
• Database snapshots
• Representing the succession of time
• Mutable State is ok if local and contained
• Allows Single-threaded processing
• Allows single writer principle
• Feels more natural
• Publish the results to the world as Immutable State
36
36. • Mobile / IoT
• HTTP and Micro
Services
• “NoSQL” DBs
• Big Data
• Fast Data
40
Distributed Computing is the
new normal
37. Reality check
• separation in space & time gives us
• communication for coordination
• variable delays
• partial failures
• partial/local/stale knowledge
41
39. 43
1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. Topology doesn't change
6. There is one administrator
7. Transport cost is zero
8. The network is homogeneous
Peter Deutsch’s
8 Fallacies of
Distributed Computing
42. 47
Linearizability
“Under linearizable consistency, all operations appear to
have executed atomically in an order that is consistent
with the global real-time ordering of operations.”
- Herlihy & Wing 1991
48. 53
“In general, application developers simply do
not implement large scalable applications
assuming distributed transactions.”
- Pat Helland
Life beyond Distributed Transactions:
an Apostate’s Opinion
49. The Event Log
• Append-Only Logging
• Database of Facts
• Two models:
• One single Event Log
• Strong Consistency
• Multiple sharded Event Logs
• Strong + Eventual Consistency
56
57. EXPERT TRAINING
Delivered on-site for Akka, Spark, Scala and Play
Help is just a click away. Get in touch
with Typesafe about our training courses.
• Intro Workshop to Apache Spark
• Fast Track & Advanced Scala
• Fast Track to Akka with Java or
Scala
• Fast Track to Play with Java or
Scala
• Advanced Akka with Java or Scala
Ask us about local trainings available by
24 Typesafe partners in 14 countries
around the world.
CONTACT US Learn more about on-site training
Editor's Notes
In Part 2, we look into how organizations with Reactive systems are able to adaptively scale in an elastic, infrastructure-efficient way, and the role that location transparency plays in distributed Reactive systems. Reactive Streams contributor and deputy CTO at Typesafe, Inc., Viktor Klang reviews what you should know about:
How Reactive systems enable near-linear scalability in order to increase performance proportionally to the allocation of resources, avoiding the constraints of bottlenecks or synchronization points within the system
How elasticity builds upon scalability in Reactive systems to automatically adjust the throughput of varying demand when resources are added or removed proportionally and dynamically at runtime.
The role of location transparency in distributed computing (in systems running on a single node or on a cluster) and how of decoupling runtime instances from their references can embrace network constraints like partial failure, network splits, dropped messages and more.
Scalability is something that I’m very passionate about.
Remember being very fascinated by distributed systems in the first courses at the university.
Guilty of doing CORBA, EJBs, RMI, XA etc.
Learned a lot the hard way—through agony and pain.
Talk:
mixed bag of things that
what works and
doesn’t work—from my point of view.
This really hard stuff.
But a few good principles & practices
can make all the difference.
Let’s go back in history and see what have changed.
Since the rules of the game have changed—fundamentally.
Not everyone might be aware of it.
Clusters:
We have a dist system from day one. With all its challenges and possibilities. Very different world.
Multicore:
Mutable state used to be ok (von Neumann arch etc.). Today we need better tools, and threads/locks won’t cut it.
RAM:
Opens up for in-memory DB and caching, have the whole data set in memory.
Disk:
No reason to ever delete data—like RDBMS in-place updates. Now we can keep all data around forever. Full history.
Network:
Faster to write to network than to disk. Opens up for new efficient replication strategies.
Lots of users:
Today most apps are put on the Internet with a massive potential user base.
Data:
Massive amounts of data needs to be moved around, analyzed and stored
Latency:
Users today are extremely impatient.
…and just around the corner we can expect:
Billions of devices all connected — Internet of Things
Smart cars, health monitors, smart homes, phones
GSM Association predicts: 24 billion devices by 2020
Others think it can be twice that: 50 billion
Computers will be running
100s
or 1000s
or perhaps even 100s of thousands of cores
Need a different designs and different tools.
Reactive apps THE answer on the server side
Example:
1980: Cray2 was considered a supercomputer (and very expensive)
2014: iPhone has more computing power (but really cheap)
Cost Gravity (Pieter Hintjens):
Generalization of Moore’s Law
Technology is getting
More and more advanced
At a cheaper and cheaper price
Exponentially
Extremely exciting, but also terrifying
responsive: react to users
The goal for any app should be that it is responsive—at all times:
not just under blue skies
under load & spikes—planned or unplanned
under failure
Responsiveness means that problems may be detected quickly and dealt with effectively
Responsive systems focus on providing rapid and consistent response times
Establishing reliable upper bounds so they deliver a consistent quality of service
The system stays responsive in the face of failure.
=> resilient: react to failure
Resilient means: to spring back into shape, not just being fault-tolerant
often bolted on after using the wrong tools,
part of design from day 1, natural state in lifecycle, manage failure
isolation/containment
avoid cascading failures
repair/heal themselves
The system stays responsive under varying workload
=> elastic: react to load, scale on demand
React to changes in the input rate by increasing or decreasing the resources allocated to service these inputs.
Need designs with no contention points or central bottlenecks
=> ability to shard or replicate components and distribute inputs among them.
Support predictive and adaptive scaling algorithms
Cost-effective use of commodity hardware
message-driven: react to messages
async, non-blocking,
efficient, lazy, push not pull
async boundary =>
loose coupling
isolation/containment + reify errors as messages
location transparency = same model and semantics everywhere
explicit MP enables:
load management, elasticity
flow control, back pressure
brings all the other traits together
A scalable application is able to be expanded according to its usage.
Need to react to increased load
Be adaptive and elastic
Be able to scale up/down and out/in on demand.
Scale on demand
Rapid growth—popularity
Unpredictable spikes and usage patternsor planned
Benefits for businesses
Changing business requirements
Pay for what you use
Cuts costs and minimizes risk of having
too much hardware idling
too little hardware (loose sleep)
Elastic means being able to:
scale on demand
scale up and down
Scalability is an enabler for Elasticity.
Viktor’s comment:
He seems to confuse performance and scalability
My definition:
Performance is the capability of a system to provide a certain response time
Scalability is the capability of a system to maintain that response time as more resources are added to deal with increasing load.
Performance it tangled with three other characteristics:
Latency
Throughput
Scalability
Many different views and definitions.
We need to utilize multicore architectures efficiently.
Memory Management in modern CPUs is very advanced
Cache coherence and invalidation protocols
Prefetching, branch speculation etc.
Hierarchical caches: L1, L2, L3
Haswell processor (in the image):
Cores 2–4, 8—Each core has a:
Local L1 cache 64 KB
Local L2 cache 256 KB
Shared L3 cache 2 MB to 8 MB
With increasingly more latency
(NEXT) So most caches are local
Same with NUMA—Non-Uniform Memory Access
Image of ccNUMA (Cache Coherent NUMA)
Cheap to access local memory on your socket
But very expensive across sockets
Roundtrip between sockets is 40 nanoseconds
Today CPUs are so efficientnormally have to stall, waiting for data
So access to local data is fast
Affects how we think about & design software
CPU doesn’t rely on plain luck—to beat the system
Like Raymond in Rain Man.
Ask it and you get the same reply: “We’re counting cards, counting cards…”
It takes three bets
Temporal: using regular caching, LRU
Spatial: things close, are likely to be used together
Pattern:
Prefetching that detects patterns in the codeIterating over an Array—vs a Linked List
Also does Branch Speculation
Can sound complicated and involved
But the good news is…
Clean code matters
Short methods
Single Responsibility Principle, Compose well
Simple logic with little branching
Things used together are put together: No Feature Envy
No clever stuff
Share nothing matters
Local state stays local
Copy state and ship it offinstead of sharing and introducing contention
If you think of how modern CPUs work
What really matters is to maximize Locality of Reference.
I.e. locality of data
Keep data close to its processing context
Minimize cache invalidations
How?
No shared mutable state
Co-locate data: Ensure they are on the same cache line.
Ideally pin threads to cores—not possible in Java
Single Writer Principle
Append Only Logs
Smart Batching etc.
Contention is the primary enemy to scalability
So, where is this bastard most likely to show up?
Physical contention points
CPU
Memory
Network IO
File IO
Database IO
Application contention points
Primitives
synchronized blocks, Locks, Barriers, Latches
Optimistic lock-free concurrencyCAS loops—contention can make it hard to make progressOveruse of volatile variables—contention on the memory bus
Data structures
Shared concurrent data structures
Persistent data structures
Tree—Structural sharing—repointing of root node
Algorithms
Join points
scatter-gather
map-reduce
fork-join
So how should we address contention?
Never. Ever. Block.
Putting threads to sleep when blocking incurs a high wake-up cost
Roughly 650 ns (on Haswell MBP15)
Can run out of threads if blocking
If you need to block
Don’t use a single threaded runtime (Node)
Use sandboxing (protected regions)
Managed blocking—hint to thread pool to allocate new threads
Instead use:
Lock-free concurrency: Optimistic CAS-based
Async message passing (next slide)
Build on an Message-driven core
Use Async Message Passing
Concurrent by design: Concurrency becomes workflow
Just like humans work and communicate
Allows you to model the real world (non-determinism)
Allows loosely coupled systems
Easier to: write, understand, maintain, evolve
Async systems
Initial hit of essential complexity, but..
Low accidental complexity
Complexity stays constant
Compare to synchronous systems
Lower initial essential complexity (familiar)
High accidental complexity
Out of the box tools:
Explicit Queues, MPI
Actors (Akka/Erlang)
Reactive Streams (Rx, Akka Streams)
Future composition
The simplest way to scale up on multicore is to fully embrace
Share Nothing Architecture
Async message passing
It gives you:
Great Locality of Reference
Minimized Contention
Since you have zero shared state
Uncontended local state
Independent processes communicating using values
So how should we design our algorithms?
Look at how old-timer winners like Ceasar did it:
Divide and Conquer
Split up the work in small discrete independent tasks
Ideally Embarrassingly Parallell
No dependencies or coupling
Sequential IO writes are fast
No contention
Single threading can be your friend
Append Only Logging is a great tool
(talk about later in context of CQRS)
Smart Batching pattern (Martin Thompson)
THEN Use pipelining—stages with messages flowing between
2 types:
Can be synchronous
Can be asynchronous
Usually a combination
Ideally run on a single thread
No cache invalidations and copying of memory
Minimized contention
Can not block or the pipeline stalls
Single threaded pipelines are all good, IF
You can max out on your CPU
If not, introduce async stages—to increase parallelization.
Need to have build in back pressure and flow control
Ideally done by the library:
Akka Streams optimisation through stream fusion
Tools:
SEDA, Actors
Disruptor, CSP
Futures or Reactive Streams
Contention: waiting or queueing for shared resources
Coherency: delay for data to become consistent
Amdahl's Law:
- EFFECT contention has on PARALLEL system
- CONTENTION gives DIMINISHING returns
Universal Scalability Law:
- ADDS Coherency
INCOHERENCY can give NEGATIVE results
Coherency == 0 => Amdahl’s Law
The 3 C’s:ConcurrencyContention
CoherencyBeta = 0 == Amdahl’s Law
To quote my dear friend The Legend of Klang….(NEXT)
As we all know, Immutability has immense value
Stable values, code that you can trust etc.
Lots of talking about immutable state and its role in building concurrent scalable systems
(NEXT) On a more serious level…
Great to represent Facts
Things that have happened
Values
Events
Database snapshots
Less ideal for a “working” data set
Persistent data structures can increase contention
Uses structural sharing with repointing on updates
Contention at the root node
Instead use a Share Nothing Architecture
with mutable state within each isolated processing unit
and immutable state sent between—events
But to truly scale on demand
We need to Scale OUT
We need Elasticity
We need to be able to add processing powerand a single node can’t give us that.
We need elasticity and efficient utilization of cluster and cloud computing architectures
Distribute systems in the new NORMAL.
We have it either we want it or not…
Deal with it.
Alright, so do we all agree that in what we call Reality, we have multiple dimensions?
What things do we get from that?
Comm for Coo:
So given that entities do not exist in the same place, it means that they need to communicate if they want to coordinate -anything-.
Delays:
Ever observed a race-condition that as you tried to fix it just became less likely?
That’s shortened delays—making the window of opportunity smaller but still possible.
Partial failures:
Since things do not exist in the same location, they especially if collaborating on something, will risk failing individually—where one succeeded and one failed, for example.
Knowledge:
Since communication is how we coordinate, it is also how we coordinate -information-, and since we have delays and partial failures, we will only ever have a subjective view of the world, one that is bound to be incomplete and stale.
In a distributed system you have
isolated machines, nodes, JVMs
You can’t possibly share memory
Which means that we need to
communicate asynchronously
using messages
Also, there is a network between.
which makes communication expensive
which is inherently unreliable
Does not just apply to nodes, but to
Clusters
Racks
Data centers
Distributed Computing is REALLY HARD.
But as we will see, solid principles can make it manageable.
But first let’s pay a visit to my own little graveyard of dist systems.
We need to learn from history’s mistakes
UNLEARN bad habits
…
So, what should we do instead?
We talked a lot about data locality
Well, it matters even more in a distributed system
Even more expensive to:
move data around repeatedly
ensure integrity of data
But let’s start with some theory.
Three models for consistency
Strong consistency
Eventual consistency
Weak consistency (not of much practical use)
Strong is defined by Linearizability
Less formally:
“A read will return the last completed write (made on any replica)”
Very strong (and expensive) guarantees
Sometimes needed
Minimize the dataset
Strong consistency protocols
Viewstamped Replication (Liskov & Cowling 1988)
Paxos (Lamport 1989)
ZAB—Zookeeper Atomic Broadcast (Reed 2008)
Raft (Ongaro & Ousterhout 2013)
Partition tolerant (if replicas > N/2-1)
Dynamic master
High latency
Medium throughput
These protocols are hard to scale.
RDMBS provides strong consistency but are hard to scale
In general Strong Consistency is
Very Expensive
But sometimes needed
Minimize the dataset
THINK about your data.
Different data has different needs in terms of guarantees.
Coordination is the main killer of scalability in a cluster
Latency is higher, coordination cost is higher.
Coherence cost is higher.
Important discovery: CAP Conjecture by Eric Brewer 2000Proof by Lynch & Gilbert 2002
Consistency, Availability, Partition Tolerance=>Pick 2
Linearizability is impossible under network partitions
CA systems do not exist
In retrospect:
Very influential—but very narrow scope
“[CAP] has lead to confusion and misunderstandings regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper
Linearizability is very often not required
Ignores latency—but in practice latency & partitions are deeply related
Partitions are rare—so why sacrifice C or A all the time?
Not black and white—can be fine-grained and dynamic
Read ‘CAP Twelve Years Later’ - Eric Brewer
But amazing work that influenced
the NOSQL movement and
Eventual Consistency
Eventual consistency—Essentially
Minimized Coordination
More headroom for Scalability & Availability
Definition: The storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value.
Popularized by Amazon’s Dynamo
What’s behind Amazon’s shopping cart, EC2 and more
Epidemic Gossip using Vector Clocks
Failure detection
Consistent Hashing
Influenced: DynamoDB, Riak, Voldemort, Cassandra
Most DBs are only Key/Value stores
BUT CRDTs provides richer Eventually Consistent Data Types
Great tool for
For minimal coordination in the cluster
Eventually consistent RICH datatypes
Registers, Maps, Sets, Graphs, etc.
Need a Monotonic merge function
2 types:
CvRDT—convergent—state-based
keep all history in the data type—like a vector clock
clients can go offline
eventually converge as long as all changes eventually reaches all replicas
has a garbage collection problem—GC needs full consistency
CmRDT—commutative—operations-based
send all state-changing operations to all replicas
needs a reliable broadcast channel
no garbage problem
But, HOW can we Scale yet provide Transactional Integrity?
Start by reading this paper, then read it again.
Can’t use distributed transactions.
So what should we use?
Let’s look at a few building blocks for making this possible.
First: Explicitly model state transitions in Domain Events
Think in Facts
Things that have completed
Always Immutable
Can’t change the paste
Verbs in past tense
CustomerRelocated
CargoShipped
InvoiceSent
Second: Use an Event Log:
The Event Log persists Domain Events
Can apply the Single Writer Principle
Append-Only Logging: AOL
Can log to
Local
Memory Mapped files (ByteBuffers in Java)
File based Journals (LevelDB etc.)
Replicated
Homegrown replicated versions (using Paxos/Raft)
Like Greg Young’s EventStore
Fully replicated NOSQL DBs backends
Or regular SQL DBs
Read The Log by Jay Kreps
Stores Facts: have already happened
The log is a DB of Facts—immutable Domain Events
Knowledge only grows
Never delete anything
Accountants never delete anything: Keeps in Ledger
Can look the perspective of two different models:
1 single event log—Datomic, Oracle TX Log
Single fully consistent snapshot of DB
Reads are “free”
Limited scalability
Multiple sharded event logs—Event Sourcing
Multiple internally consistent views
Aggregate Root is consistency boundary
Strong Consistency within AR
Eventual Consistency between AR
=> Joins are eventually consistent
Unlimited scalability
By now I hope it is clear that thesimplest way to scale out is to fully embrace
Share Nothing
Async message passing
You have zero shared state => Uncontended local state
Independent processes communicating using Values
Gives us what we need:
Great Locality of Reference
Minimized Contention/Coordination
If possible—use CRDTs to model shared state
The KEY here is: Location Transparency
Should not be underestimated
It is not transparent distributed computing
Does not violate Waldo’s ‘A Note On Distributed Computing’
But the opposite:
Explicit distributed computing
Local communication is an optimization
Embrace the Network and the essence of it:
Locality of data
Async message passing
This gives you a:
One model
one thing to learn and understand
with one set of semantics
regardless if we scale UP or OUT
Instead of having to use two completely diff models…
Runtime that can optimize communication by improving
Locality
Communication
Adaptive routing protocols—gather metrics and acts
What I’ve tried to highlight in this talk is that You can think of Scalability very much like Escher’s painting Print Gallery
Small to large—at every level
It is basically the same
Small “machines” with local memory
Communicating with async messages
The same design principles can be used to solve the problem at any level
Regarding the video:
Animation of Escher’s Print Gallery
The original painting had a blank hole in the middle.
Left a few questions:
What is missing? What is really in this hole?
Why did Escher not paint it out? What was the problem?
Escher left sketches of how he drew the perspectives—mathematically
Can be explained and completed mathematically (Droste effect)
Escher had an an incredible mathematical intuition
Read more here: http://escherdroste.math.leidenuniv.nl/index.php?menu=intro
If we apply this way of looking at things to systems.
It’s all separate “machines” or “units”
with local memory communicating
with async message passing
Embrace this fact.
So…
To make it scale on
multiple independent processing units
all with local memory
communicating with async message passing
The same challenges and (conceptually) the same solutions
The techniques and technologies will vary.
But the principles stays the same:
Share Nothing Architecture
Building on an Message-driven foundation.
Decoupling in Time and Space
Location Transparency
In Part 2, we look into how organizations with Reactive systems are able to adaptively scale in an elastic, infrastructure-efficient way, and the role that location transparency plays in distributed Reactive systems. Reactive Streams contributor and deputy CTO at Typesafe, Inc., Viktor Klang reviews what you should know about:
How Reactive systems enable near-linear scalability in order to increase performance proportionally to the allocation of resources, avoiding the constraints of bottlenecks or synchronization points within the system
How elasticity builds upon scalability in Reactive systems to automatically adjust the throughput of varying demand when resources are added or removed proportionally and dynamically at runtime.
The role of location transparency in distributed computing (in systems running on a single node or on a cluster) and how of decoupling runtime instances from their references can embrace network constraints like partial failure, network splits, dropped messages and more.