3. Companies continuously increase
More and more data and traffic
More and more computing resources needed
SOLUTION
SCALING
15/12/2012
Scalability – The need for speed
3
4. vertical scalability = scale up
single server
performance ⇒ more resources (CPUs, storage, memory)
volumes increase ⇒ more difficult and expensive to scale
not reliable: individual machine failures are common
horizontal scalability = scale out
cluster of servers
performance ⇒ more servers
cheaper hardware (more likely to fail)
volumes increase ⇒ complexity ~ constant, costs ~ linear
reliability: CAN operate despite failures
complex: use only if benefits are compelling
15/12/2012
Scalability – The need for speed
4
6. All data on a single node
Use cases
data usage = mostly processing aggregates
many graph databases
Pros/Cons
RDBMSs or NoSQL databases
simplest and most often recommended option
only vertical scalability
15/12/2012
Scalability – Vertical scalability
6
8. Shared everything
every node has access to all data
all nodes share memory and disk storage
used on some RDBMSs
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
8
9. Shared disk
every node has access to all data
all nodes share disk storage
used on some RDBMSs
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
9
10. Shared nothing
nodes are independent and self-sufficient
no shared memory or disk storage
used on some RDBMSs and all NoSQL databases
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
10
11. Sharding
different data put on different nodes
Replication
same data copied over multiple nodes
Sharding + replication
the two orthogonal techniques combined
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
11
12. Different parts of the data onto different nodes
data accessed together (aggregates) are on the same node
clumps arranged by physical location, to keep load even,
or according to any domain-specific access rule
R
W
A
F
H
Shard
15/12/2012
R
W
B
E
G
Shard
R
W
C
D
I
Shard
Scalability – Horizontal scalability: architectures and distribution models
12
13. Use cases
different people access different parts of the dataset
to horizontally scale writes
Pros/Cons
“manual” sharding with every RDBMS or NoSQL store
better read performance
better write performance
low resilience: all but failing node data available
high licensing costs for RDBMSs
difficult or impossible cluster-level operations
(querying, transactions, consistency controls)
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
13
14. Data replicated across multiple nodes
One designated master (primary) node
• contains the original
• processes writes and passes them on
All other nodes are slave (secondary)
• contain the copies
• synchronized with the master during a replication process
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
14
16. Use cases
load balancing cluster: data usage mostly read-intensive
failover cluster: single server with hot backup
Pros/Cons
better read performance
worse write performance (write management)
high read (slave) resilience:
master failure ⇒ slaves can still handle read requests
low write (master) resilience:
master failure ⇒ no writes until old/new master is up
read inconsistencies: update not propagated to all slaves
master = bottleneck and single point of failure
high licensing costs for RDBMSs
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
16
17. Data replicated across multiple nodes
All nodes are peer (equal weight): no master, no slaves
All nodes can both read and write
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
17
19. Use cases
load balancing cluster: data usage read/write-intensive
need to scale out more easily
Pros/Cons
better read performance
better write performance
high resilience:
node failure ⇒ reads/writes handled by other nodes
read inconsistencies: update not propagated to all nodes
write inconsistencies: same record at the same time
high licensing costs for RDBMSs
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
19
20. Sharding + master-slave replication
multiple masters
each data item has a single master
node configurations:
• master
• slave
• master for some data / slave for other data
Sharding + peer-to-peer replication
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
20
23. Oracle Database
Oracle RAC
shared everything
Microsoft SQL Server
All editions
shared nothing
master-slave replication
IBM DB2
DB2 pureScale
DB2 HADR
15/12/2012
shared disk
shared nothing
master-slave replication (failover cluster)
Scalability – Horizontal scalability: architectures and distribution models
23
24. Oracle MySQL
MySQL Cluster
shared nothing
sharding, replication, sharding + replication
The PostgreSQL Global Development Group PostgreSQL
PGCluster-II
shared disk
Postgres-XC
shared nothing
sharding, replication, sharding + replication
15/12/2012
Scalability – Horizontal scalability: architectures and distribution models
24
26. Inconsistent write = write-write conflict
multiple writes of the same data at the same time
(highly likely with peer-to-peer replication)
Inconsistent read = read-write conflict
read in the middle of someone else’s write
15/12/2012
Scalability – Horizontal scalability: consistency
26
27. Pessimistic approach
prevent conflicts from occurring
Optimistic approach
detect conflicts and fix them
15/12/2012
Scalability – Horizontal scalability: consistency
27
28. Implementation
write locks ⇒ acquire a lock before updating a value
(only one lock at a time can be tacken)
Pros/Cons
often severely degrade system responsiveness
often leads to deadlocks (hard to prevent/debug)
rely on a consistent serialization of the updates*
* sequential consistency
ensuring that all nodes apply operations in the same order
15/12/2012
Scalability – Horizontal scalability: consistency
28
29. Implementation
conditional updates ⇒ test a value before updating it
(to see if it's changed since the last read)
merged updates ⇒ merge conflicted updates somehow
(save updates, record conflict and merge somehow)
Pros/Cons
conditional updates
rely on a consistent serialization of the updates*
* sequential consistency
ensuring that all nodes apply operations in the same order
15/12/2012
Scalability – Horizontal scalability: consistency
29
30. Logical consistency
different data make sense together
Replication consistency
same data ⇒ same value on different replicas
Read-your-writes consistency
users continue seeing their updates
15/12/2012
Scalability – Horizontal scalability: consistency
30
31. ACID transactions ⇒ aggregate-ignorant DBs
Partially atomic updates ⇒ aggregate-oriented DBs
atomic updates within an aggregate
no atomic updates between aggregates
updates of multiple aggregates: inconsistency window
replication can lengthen inconsistency windows
15/12/2012
Scalability – Horizontal scalability: consistency
31
32. Eventual consistency
nodes may have replication inconsistencies:
stale (out of date) data
eventually all nodes will be synchronized
15/12/2012
Scalability – Horizontal scalability: consistency
32
33. Session consistency
within a user’s session there is read-your-writes consistency
(no stale data read from a node after an update on another one)
consistency lost if
• session ends
• the system is accessed simultaneously from different PCs
implementations
• sticky session/session affinity = sessions tied to one node
affects load balancing
quite intricate with master-slave replication
• version stamps
track latest version stamp seen by a session
ensure that all interactions with the data store include it
15/12/2012
Scalability – Horizontal scalability: consistency
33
35. Consistency
all nodes see the same data at the same time
Latency
the response time in interactions between nodes
Availability
every nonfailing node must reply to requests
the limit of latency that we are prepared to tolerate:
once latency gets too high, we give up and treat data as
unavailable
Partition tolerance
the cluster can survive communication breakages
(separating it into partitions unable to communicate with each other)
15/12/2012
Scalability – Horizontal scalability: CAP theorem
35
36. 1) read(A)
2) A = A – 50
Transaction to transfer $50
from account A to account B
3) write(A)
4) read(B)
5) B = B + 50
6) write(B)
Atomicity
• transaction fails after 3 and before 6 ⇒ the system should
ensure that its updates are not reflected in the database
Consistency
• A + B is unchanged by the execution of the transaction
15/12/2012
Scalability – Horizontal scalability: CAP theorem
36
37. 1) read(A)
2) A = A – 50
Transaction to transfer $50
from account A to account B
3) write(A)
4) read(B)
5) B = B + 50
6) write(B)
Isolation
• another transaction will see inconsistent data between 3 and 6
(A + B will be less than it should be)
• Isolation can be ensured trivially by running transactions
serially ⇒ performance issue
Durability
• user notified that transaction completed ($50 transferred)
⇒ transaction updates must persist despite failures
15/12/2012
Scalability – Horizontal scalability: CAP theorem
37
38. Basically Available
Soft state
Eventually consistent
Soft state and eventual consistency are techniques that work
well in the presence of partitions and thus promote availability
15/12/2012
Scalability – Horizontal scalability: CAP theorem
38
39. Given the three properties of
Consistency, Availability and
Partition tolerance,
you can only get two
15/12/2012
Scalability – Horizontal scalability: CAP theorem
39
40. C
being up and keeping consistency is reasonable
A
one node: if it’s up it’s available
P
a single machine can’t partition
15/12/2012
Scalability – Horizontal scalability: CAP theorem
40
41. AP ( C )
partition ⇒ update on one node = inconsistency
15/12/2012
Scalability – Horizontal scalability: CAP theorem
41
42. CP ( A )
partition ⇒ consistency only if one nonfailing
node stops replying to requests
15/12/2012
Scalability – Horizontal scalability: CAP theorem
42
43. CA ( P )
nodes communicate ⇒ C and A can be preserved
partition ⇒ all nodes on one partition must be
turned off (failing nodes preserve A)
difficult and expensive
15/12/2012
Scalability – Horizontal scalability: CAP theorem
43
44. ACID databases
focus on consistency first and availability second
BASE databases
focus on availability first and consistency second
15/12/2012
Scalability – Horizontal scalability: CAP theorem
44
45. Single server
no partitions
consistency versus performance: relaxed isolation
levels or no transactions
Cluster
consistency versus latency/availability
durability versus performance (e.g. in memory DBs)
durability versus latency (e.g. the master
acknowledges the update to the client only after
having been acknowledged by some slaves)
15/12/2012
Scalability – Horizontal scalability: CAP theorem
45
46. strong write consistency ⇒ write to the master
strong read consistency ⇒ read from the master
15/12/2012
Scalability – Horizontal scalability: CAP theorem
46
47. N = replication factor
(nodes involved in replication NOT nodes in the cluster)
W = nodes confirming a write
R = nodes needed for a consistent read
write quorum: W > N/2
read quorum: R + W > N
Consistency is on a per operation basis
Choose the most appropriate combination of
problems and advantages
15/12/2012
Scalability – Horizontal scalability: CAP theorem
47