No stress with state

No stress with state
Hands-on tips for scalable applications

Uwe Friedrichsen, codecentric AG, 2014

@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com

Your system is only scalable
if you can easily scale up
and down

State is the enemy of
scalable applications

State …

•  … avoids load distribution
•  … requires replication
•  … makes systems brittle
•  … makes scaling down hard

General design rule

•  Move state either to the frontend and/or to the backend
•  Make everything else stateless
•  Use caches if that’s too slow (but be aware of their tradeoffs)
Frontend
Client
Backend
Store
Application
Stateful
Stateful
Stateless

Frontend client state examples

•  Cookies (fondly handcrafted)
•  Play framework
•  JavaScript client
•  HTML5 storage

And what if we are bound to
a web framework that
heavily uses session state?

Standard session state handling

•  Sticky Sessions
•  Session Replication

Sticky Sessions
Load
Balancer
Session
Node
4711
2
0815
4
…
…
Session: 4711
Node: 2

Sticky Sessions
Load
Balancer
Session
Node
4711
2
0815
4
…
…
Session: 4711
Problem: Node is down
✖
New Node: 4
Node 4 does not
know session 4711

Sticky Sessions
Load
Balancer
Session
Node
4711
2
0815
4
…
…
Problem: Load is unequally distributed

Sticky Sessions
Load
Balancer
Session
Node
4711
2
0815
4
…
…
Problem: Scale down
#Sessions: 1
#Sessions: 1
#Sessions: 1
#Sessions: 1
#Sessions: 1
Cannot shut down node
without losing session

Session Replication
Load
Balancer
Request
Arbitrary
node
Session replication

But what if we really
need to scale?

Lazy session replication
leaves you with inconsistencies
Load
Balancer
Eager session replication
does not scale well

What we really want

•  Scaling (up) behavior of sticky sessions
•  Node independence of session replication

à Isolated nodes with shared session cache

Isolated nodes with shared session cache
Load
Balancer
Request
Arbitrary
node
Application nodes
Cache nodes

Isolated nodes with shared session cache

•  Easy load distribution
•  Easy up- and down-scaling
•  Tolerant against node failures
•  Independent scaling of application and cache nodes

But isn’t that way too slow?

Scalability vs. Latency

You need to give up some latency for an individual user 
to provide acceptable latency for many users
Latencydistribution
Not scalable
Acceptable latency
Single user
Many users
Scalable
Acceptable latency
Latencydistribution
Single user
Many users

Interception points

•  StandardSession (Tomcat)
•  HttpSessionAttributeListener (JEE)
•  Valve (Tomcat)
•  Filter (JEE)
•  SessionManager (ManagerBase, Tomcat)

Available libraries

•  For Redis
•  For MemCached
•  For MongoDB
•  For Cassandra
•  …

Example library

•  Redis Session Manager for Apache Tomcat 
https://github.com/jcoleman/tomcat-redis-session-manager
•  A bit outdated, but easy to use
•  Usage
•  Install and start Redis
•  Copy tomcat-redis-session-manager.jar and 
jedis-<version>.jar to Tomcat lib directory
•  Update Tomcat configuration
•  Restart Tomcat

context.xml
<Context>
...
<Valve className="com.radiadesign.catalina.session.RedisSessionHandlerValve" />
<Manager className="com.radiadesign.catalina.session.RedisSessionManager"
host="localhost”
port="6379"
database="0”
maxInactiveInterval="60" />
</Context>

Tradeoffs

•  Easy to set up and use
•  No code required, only configuration
•  Works smoothly

•  Not best for web frameworks 
that store lots of session state
•  Does not work well with AJAX
•  Does not work with single page applications
 
à Another reason to prefer ROCA style?

Challenges of scalable datastores

•  Requires relaxed consistency guarantees
•  Leads to (temporal) anomalies and inconsistencies 
short-term due to replication or long-term due to partitioning

It might happen. Thus, it will happen!

C
Consistency
A
Availability
P
Partition Tolerance
Strict Consistency

ACID / 2PC
Strong Consistency

Quorum R&W / Paxos
Eventual Consistency

CRDT / Gossip / Hinted Handoff

Strict Consistency (CA)

•  Great programming model 
no anomalies or inconsistencies need to be considered
•  Does not scale well 
best for single node databases

„We know ACID – It works!“
„We know 2PC – It sucks!“

Use for moderate data amounts

And what if I need more data?

•  Distributed datastore
•  Partition tolerance is a must
•  Need to give up strict consistency (CP or AP)

Strong Consistency (CP)

•  Majority based consistency model 
can tolerate up to N nodes failing out of 2N+1 nodes
•  Good programming model 
Single-copy consistency
•  Trades consistency for availability 
in case of partitioning

Paxos (for sequential consistency)
Quorum-based reads & writes

And what if I need more availability?

•  Need to give up strong consistency (CP)
•  Relax required consistency properties even more
•  Leads to eventual consistency (AP)

Eventual Consistency (AP)

•  Gives up some consistency guarantees 
no sequential consistency, anomalies become visible
•  Maximum availability possible 
can tolerate up to N-1 nodes failing out of N nodes
•  Challenging programming model 
anomalies usually need to be resolved explicitly

Gossip / Hinted Handoffs
CRDT

Conflict-free Replicated Data Types

•  Eventually consistent, self-stabilizing data structures
•  Designed for maximum availability
•  Tolerates up to N-1 out of N nodes failing

State-based CRDT: Convergent Replicated Data Type (CvRDT)
Operation-based CRDT: Commutative Replicated Data Type (CmRDT)

Convergent Replicated Data Type

State-based CRDT – CvRDT

•  All replicas (usually) connected
•  Exchange state between replicas, calculate new state on target replica
•  State transfer at least once over eventually-reliable channels
•  Set of possible states form a Semilattice
•  Partially ordered set of elements where all subsets have a Least Upper Bound (LUB)
•  All state changes advance upwards with respect to the partial order

Commutative Replicated Data Type

Operation-based CRDT - CmRDT

•  All replicas (usually) connected
•  Exchange update operations between replicas, apply on target replica
•  Reliable broadcast with ordering guarantee for non-concurrent
updates
•  Concurrent updates must be commutative

That‘s been enough theory ...

Op-based Counter

Data
Integer i
Init
i ≔ 0
Query
return i
Operations
increment(): i ≔ i + 1
decrement(): i ≔ i - 1

State-based G-Counter (grow only)
(Naïve approach)

Data
Integer i
Init
i ≔ 0
Query
return i
Update
increment(): i ≔ i + 1
Merge( j)
i ≔ max(i, j)

(Naïve approach)
R1
R3
R2
i = 1
U
i = 1
U
i = 0
i = 0
i = 0
I
I
I
M
i = 1
i = 1
M

(Vector-based approach)

Data
Integer V[] / one element per replica set
Init
V ≔ [0, 0, ... , 0]
Query
return ∑i V[i]
Update
increment(): V[i] ≔ V[i] + 1 / i is replica set number
Merge(V‘)
∀i ∈ [0, n-1] : V[i] ≔ max(V[i], V‘[i])

(Vector-based approach)
R1
R3
R2
V = [1, 0, 0]
U
V = [0, 0, 0]
I
I
I
V = [0, 0, 0]
V = [0, 0, 0]
U
V = [0, 0, 1]
M
V = [1, 0, 0]
M
V = [1, 0, 1]

State-based PN-Counter (pos./neg.)

•  Simple vector approach as with G-Counter does not work
•  Violates monotonicity requirement of semilattice
•  Need to use two vectors
•  Vector P to track incements
•  Vector N to track decrements
•  Query result is ∑i P[i] – N[i]

State-based PN-Counter (pos./neg.)

Data
Integer P[], N[] / one element per replica set
Init
P ≔ [0, 0, ... , 0], N ≔ [0, 0, ... , 0]
Query
Return ∑i P[i] – N[i]
Update
increment(): P[i] ≔ P[i] + 1 / i is replica set number
decrement(): N[i] ≔ N[i] + 1 / i is replica set number
Merge(P‘, N‘)
∀i ∈ [0, n-1] : P[i] ≔ max(P[i], P‘[i])
∀i ∈ [0, n-1] : N[i] ≔ max(N[i], N‘[i])

Non-negative Counter

Problem: How to check a global invariant with local information only?

•  Approach 1: Only dec if local state is > 0
•  Concurrent decs could still lead to negative value

•  Approach 2: Externalize negative values as 0
•  inc(negative value) == noop(), violates counter semantics

•  Approach 3: Local invariant – only allow dec if P[i] - N[i] > 0
•  Works, but may be too strong limitation

•  Approach 4: Synchronize
•  Works, but violates assumptions and prerequisites of CRDTs

More datatypes

•  Register
•  Set
•  Dictionary (Map)
•  Tree
•  Graph
•  Array
•  List

plus several representations for each datatype

Limitations of CRDTs

•  Very weak consistency guarantees 
Strives for „quiescent consistency“
•  Eventually consistent 
Not suitable for high-volume ID generator or alike
•  Not easy to understand and model
•  Not all data structures representable

Use if availability is extremely important

Further reading on CRDTs

1.  Shapiro et al., Conflict-free Replicated Data
Types, Inria Research report, 2011
2.  Shapiro et al., A comprehensive study of
Convergent and Commutative Replicated
Data Types, 
Inria Research report, 2011
3.  Basho Tag Archives: CRDT, 
https://basho.com/tag/crdt/
4.  Leslie Lamport, 
Time, clocks, and the ordering of events in a
distributed system, 
Communications of the ACM (1978)

Wrap-up

•  Scalability means scaling up and down
•  State is the enemy of scalability
•  Move state to the frontend and backend
•  Shared session state layer
•  CRDTs

The best way to deal with state is to avoid it

No stress with state

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to No stress with state

Similar to No stress with state (20)

More from Uwe Friedrichsen

More from Uwe Friedrichsen (10)

Recently uploaded

Recently uploaded (20)

No stress with state