SlideShare a Scribd company logo
1 of 44
Download to read offline
CRDTs and Redis
From sequential to concurrent executions
Carlos Baquero
HASLab, INESC TEC & Univ. Minho, Portugal
April 26th 2018
1/32
2/32
The speed of communication in the 19th century
W. H. Harrison’s death
“At 12:30 am on April 4th, 1841 President
William Henry Harrison died of pneumonia
just a month after taking office. The Rich-
mond Enquirer published the news of his
death two days later on April 6th. The North-
Carolina standard newspaper published it on
April 14th. His death wasn’t known of in Los
Angeles until July 23rd, 110 days after it had
occurred.”
Text by Zack Bloom, A Quick History of Digital Communication Before the
Internet. https://eager.io/blog/communication-pre-internet/
Picture by By Albert Sands Southworth and Josiah Johnson Hawes
3/32
The speed of communication in the 19th century
Francis Galton Isochronic Map
4/32
The speed of communication in the 21st century
RTT data gathered via http://www.azurespeed.com
5/32
The speed of communication in the 21st century
If you really like high latencies . . .
Time delay between Mars and Earth
blogs.esa.int/mex/2012/08/05/time-delay-between-mars-and-earth/
Delay/Disruption Tolerant Networking
www.nasa.gov/content/dtn
6/32
Latency magnitudes
Geo-replication
λ, up to 50ms (local region DC)
Λ, between 100ms and 300ms (inter-continental)
No inter-DC replication
Client writes observe λ latency
Planet-wide geo-replication
Replication techniques versus client side write latency ranges
Consensus/Paxos [Λ, 2Λ] (with no divergence)
Primary-Backup [λ, Λ] (asynchronous/lazy)
Multi-Master λ (allowing divergence)
7/32
EC and CAP for Geo-Replication
Eventually Consistent. CACM 2009, Werner Vogels
In an ideal world there would be only one consistency model:
when an update is made all observers would see that update.
Building reliable distributed systems at a worldwide scale
demands trade-offs between consistency and availability.
CAP theorem. PODC 2000, Eric Brewer
Of three properties of shared-data systems – data consistency,
system availability, and tolerance to network partition – only two
can be achieved at any given time.
CRDTs provide support for partition-tolerant high availability
8/32
From sequential to concurrent executions
Consensus provides illusion of a single replica
This also preserves (slow) sequential behaviour
Sequential execution
Ops O o // p // q
Time //
We have an ordered set (O, <). O = {o, p, q} and o < p < q
8/32
From sequential to concurrent executions
Consensus provides illusion of a single replica
This also preserves (slow) sequential behaviour
Sequential execution
Ops O o // p // q
Time //
We have an ordered set (O, <). O = {o, p, q} and o < p < q
9/32
From sequential to concurrent executions
EC Multi-master (or active-active) can expose concurrency
Concurrent execution
p // q

Ops O o
??
r
s
77
Time //
Partially ordered set (O, ). o p q r and o s r
Some ops in O are concurrent: p s and q s
10/32
Design of Conflict-Free Replicated Data Types
A partially ordered log (polog) of operations implements any CRDT
Replicas keep increasing local views of an evolving distributed polog
Any query, at replica i, can be expressed from local polog Oi
Example: Counter at i is |{inc | inc ∈ Oi }| − |{dec | dec ∈ Oi }|
CRDTs are efficient representations that follow some general rules
10/32
Design of Conflict-Free Replicated Data Types
A partially ordered log (polog) of operations implements any CRDT
Replicas keep increasing local views of an evolving distributed polog
Any query, at replica i, can be expressed from local polog Oi
Example: Counter at i is |{inc | inc ∈ Oi }| − |{dec | dec ∈ Oi }|
CRDTs are efficient representations that follow some general rules
10/32
Design of Conflict-Free Replicated Data Types
A partially ordered log (polog) of operations implements any CRDT
Replicas keep increasing local views of an evolving distributed polog
Any query, at replica i, can be expressed from local polog Oi
Example: Counter at i is |{inc | inc ∈ Oi }| − |{dec | dec ∈ Oi }|
CRDTs are efficient representations that follow some general rules
11/32
Principle of permutation equivalence
If operations in sequence can commute, preserving a given result,
then under concurrency they should preserve the same result
Sequential
inc(10) // inc(35) // dec(5) // inc(2)
dec(5) // inc(2) // inc(10) // inc(35)
Concurrent
inc(35)

inc(10)
88

inc(2)
dec(5)
88
You guessed: Result is 42
12/32
Implementing Counters
Example: CRDT PNCounters
A inc(35)

B inc(10)
88

inc(2)
C dec(5)
88
Lets track total number of incs and decs done at each replica
{A(incs, decs), . . . , C(. . . , . . .)}
13/32
Implementing Counters
Example: CRDT PNCounters
Separate positive and negative counts are kept per replica
A {A(35, 0), B(10, 0)}
++
B {B(10, 0)}
66
((
{A(35, 0), B(12, 0), C(0, 5)}
C {B(10, 0), C(0, 5)}
33
Joining does point-wise maximums among entries (semilattice)
At any time, counter value is sum of incs minus sum of decs
13/32
Implementing Counters
Example: CRDT PNCounters
Separate positive and negative counts are kept per replica
A {A(35, 0), B(10, 0)}
++
B {B(10, 0)}
66
((
{A(35, 0), B(12, 0), C(0, 5)}
C {B(10, 0), C(0, 5)}
33
Joining does point-wise maximums among entries (semilattice)
At any time, counter value is sum of incs minus sum of decs
14/32
Implementing Counters
Redis CRDT Counters
There are multiple ways to implement CRDT counters
Redis has a distinct implementation that favours garbage collection
Redis CRDT counters are 59 bits (not 64) to avoid overflows
15/32
Registers
Registers are an ordered set of write operations
Sequential execution
A wr(x) // wr(j) // wr(k) // wr(x)
Sequential execution under distribution
A wr(x)
%%
wr(x)
B wr(j) // wr(k)
99
Register value is x, the last written value
16/32
Implementing Registers
Naive Last-Writer-Wins
CRDT register implemented by attaching local wall-clock times
Sequential execution under distribution
A (11:00)x
''
(11:30)?
##
B (12:02)j // (12:05)k
77
?
Problem: Wall-clock on B is one hour ahead of A
Value x might not be writeable again at A since 12:05  11:30
17/32
Registers
Sequential Semantics
Register shows value v at replica i iff
wr(v) ∈ Oi
and
wr(v) ∈ Oi · wr(v)  wr(v )
18/32
Preservation of sequential semantics
Concurrent semantics should preserve the sequential semantics
This also ensures correct sequential execution under distribution
19/32
Multi-value Registers
Concurrency semantics shows all concurrent values
{v | wr(v) ∈ Oi ∧ wr(v ) ∈ Oi · wr(v) wr(v )}
Concurrent execution
A wr(x)
%%
// wr(y) // {y, k} // wr(m) // {m}
B wr(j) // wr(k)
99
Dynamo shopping carts are multi-value registers with payload sets
The m value could be an application level merge of values y and k
20/32
Implementing Multi-value Registers
Concurrency can be preciselly tracked with version vectors
Concurrent execution (version vectors)
A [1, 0]x
%%
// [2, 0]y // [2, 0]y, [1, 2]k // [3, 2]m
B [1, 1]j // [1, 2]k
66
Metadata can be compressed with a common causal context and a
single scalar per value (dotted version vectors)
21/32
Registers in Redis
LWW arbitration
Multi-value registers allows executions leading to concurrent values
Presenting concurrent values is at odds with the sequential API
Redis both tracks causality and registers wall-clock times
Querying uses Last-Writer-Wins selection among concurrent values
This preserves correctness of sequential semantics
A value with clock 12:05 can still be causally overwritten at 11:30
22/32
Sets
Sequential Semantics
Consider add and rmv operations
X = {. . .}, add(a) −→ add(c) we observe that a, c ∈ X
X = {. . .}, add(c) −→ rmv(c) we observe that c ∈ X
In general, given Oi , the set has elements
{e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e)  rmv(e)}
22/32
Sets
Sequential Semantics
Consider add and rmv operations
X = {. . .}, add(a) −→ add(c) we observe that a, c ∈ X
X = {. . .}, add(c) −→ rmv(c) we observe that c ∈ X
In general, given Oi , the set has elements
{e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e)  rmv(e)}
22/32
Sets
Sequential Semantics
Consider add and rmv operations
X = {. . .}, add(a) −→ add(c) we observe that a, c ∈ X
X = {. . .}, add(c) −→ rmv(c) we observe that c ∈ X
In general, given Oi , the set has elements
{e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e)  rmv(e)}
22/32
Sets
Sequential Semantics
Consider add and rmv operations
X = {. . .}, add(a) −→ add(c) we observe that a, c ∈ X
X = {. . .}, add(c) −→ rmv(c) we observe that c ∈ X
In general, given Oi , the set has elements
{e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e)  rmv(e)}
22/32
Sets
Sequential Semantics
Consider add and rmv operations
X = {. . .}, add(a) −→ add(c) we observe that a, c ∈ X
X = {. . .}, add(c) −→ rmv(c) we observe that c ∈ X
In general, given Oi , the set has elements
{e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e)  rmv(e)}
23/32
Sets
Concurrency Semantics
Problem: Concurrently adding and removing the same element
Concurrent execution
A add(x)
%%
// rmv(x) // {?} // add(x) // {x}
B rmv(x) // add(x)
::
24/32
Concurrency Semantics
Add-Wins Sets
Let’s choose Add-Wins
Consider a set of known operations Oi , at node i, that is ordered
by an happens-before partial order . Set has elements
{e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)}
Is this familiar?
The sequential semantics applies identical rules on a total order
Redis CRDT sets are Add-Wins Sets
24/32
Concurrency Semantics
Add-Wins Sets
Let’s choose Add-Wins
Consider a set of known operations Oi , at node i, that is ordered
by an happens-before partial order . Set has elements
{e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)}
Is this familiar?
The sequential semantics applies identical rules on a total order
Redis CRDT sets are Add-Wins Sets
24/32
Concurrency Semantics
Add-Wins Sets
Let’s choose Add-Wins
Consider a set of known operations Oi , at node i, that is ordered
by an happens-before partial order . Set has elements
{e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)}
Is this familiar?
The sequential semantics applies identical rules on a total order
Redis CRDT sets are Add-Wins Sets
24/32
Concurrency Semantics
Add-Wins Sets
Let’s choose Add-Wins
Consider a set of known operations Oi , at node i, that is ordered
by an happens-before partial order . Set has elements
{e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)}
Is this familiar?
The sequential semantics applies identical rules on a total order
Redis CRDT sets are Add-Wins Sets
25/32
Equivalence to a sequential execution?
Add-Wins Sets
Can we always explain a concurrent execution by a sequential one?
Concurrent execution
A {x, y} // add(y) // rmv(x) // {y} //
##
{x, y}
B {x, y} // add(x) // rmv(y) // {x} //
;;
{x, y}
Two (failed) sequential explanations
H1 {x, y} // . . . // rmv(x) // { x, y}
H2 {x, y} // . . . // rmv(y) // {x, y}
Concurrent executions can have richer outcomes
26/32
Concurrency Semantics
Remove-Wins Sets
Alternative: Let’s choose Remove-Wins
Xi
.
= {e | add(e) ∈ Oi ∧ ∀ rmv(e) ∈ Oi · rmv(e) add(e)}
Remove-Wins requires more metadata than Add-Wins
Both Add and Remove-Wins have same semantics in a total order
They are different but both preserve sequential semantics
26/32
Concurrency Semantics
Remove-Wins Sets
Alternative: Let’s choose Remove-Wins
Xi
.
= {e | add(e) ∈ Oi ∧ ∀ rmv(e) ∈ Oi · rmv(e) add(e)}
Remove-Wins requires more metadata than Add-Wins
Both Add and Remove-Wins have same semantics in a total order
They are different but both preserve sequential semantics
27/32
Choice of semantics
Design freedom is limited by preservation of sequential semantics
Delaying choice of semantics to query time
A CRDT Set data type could store enough information to allow a
parametrized query that shows either Add-Wins or Remove-Wins
Open question: Reason for an extended concurrency aware API?
28/32
Sequence/List
Weak/Strong Specification [Attiya et al, PODC 16]
Element x is kept
rpush(b) // xb
$$
// lpush(x) // x
99
%%
axb
lpush(a) // ax
::
Element x is removed (Redis enforces Strong Specification)
rpush(b) // xb
##
// lpush(x) // x
::
$$
// rem(x) // // ab ¬ ba
lpush(a) // ax
;;
29/32
Take home message
Concurrent executions are needed to deal with latency
Behaviour changes when moving from sequential to concurrent
Road to accommodate transition:
Permutation equivalence
Preserving sequential semantics
Concurrent executions lead to richer outcomes
CRDTs provide sound guidelines and encode policies
30/32
Thanks and Questions
Thanks to Yiftach, Yuval, Yossi, Meir, Cihan for their inputs,
and my colleagues for early feedback
Glad to address any questions
Carlos Baquero, cbm@di.uminho.pt, @xmal
#Redis #RedisConf
31/32
Causal Consistency
Redis CRDTs provide per-key causal consistency
Source FIFO (TCP)
A apple

%%
B apple pie
##
hot
##
C pie hot peach? apple
Causal consistency
A apple
%%
B apple
%%
pie
$$
hot

C apple pie hot tasty
Strongest highly available consistency model
32/32
Causal Consistency
Redis CRDTs provide per-key causal consistency
Source FIFO (TCP)
A add(a)
''

B add(a) rmv(a)
''
add(b)
''
C rmv(a) add(b) add(a)
Causal consistency
A add(a)
%%
B add(a)
%%
rmv(a)

add(b)
%%
C add(a) rmv(a) add(b)
Strongest highly available consistency model

More Related Content

What's hot

Time and space complexity
Time and space complexityTime and space complexity
Time and space complexity
Ankit Katiyar
 
Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)
Kumar
 

What's hot (20)

Theory of Automata and formal languages Unit 5
Theory of Automata and formal languages Unit 5Theory of Automata and formal languages Unit 5
Theory of Automata and formal languages Unit 5
 
SLE2015: Distributed ATL
SLE2015: Distributed ATLSLE2015: Distributed ATL
SLE2015: Distributed ATL
 
Algorithm.ppt
Algorithm.pptAlgorithm.ppt
Algorithm.ppt
 
Lec7
Lec7Lec7
Lec7
 
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
Parametrized Model Checking of Fault Tolerant Distributed Algorithms by Abstr...
 
High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018
 
Time andspacecomplexity
Time andspacecomplexityTime andspacecomplexity
Time andspacecomplexity
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Forward kinematics
Forward kinematicsForward kinematics
Forward kinematics
 
Time and space complexity
Time and space complexityTime and space complexity
Time and space complexity
 
20121021 bspapproach tiskin
20121021 bspapproach tiskin20121021 bspapproach tiskin
20121021 bspapproach tiskin
 
Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AI
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R
 
Task Constrained Motion Planning for Snake Robot
Task Constrained Motion Planning for Snake RobotTask Constrained Motion Planning for Snake Robot
Task Constrained Motion Planning for Snake Robot
 
Automata theory - Push Down Automata (PDA)
Automata theory - Push Down Automata (PDA)Automata theory - Push Down Automata (PDA)
Automata theory - Push Down Automata (PDA)
 
Complexity of Algorithm
Complexity of AlgorithmComplexity of Algorithm
Complexity of Algorithm
 
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
 
Kumegawa russia
Kumegawa russiaKumegawa russia
Kumegawa russia
 

Similar to CRDTs and Redis

Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
James Wong
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
Fraboni Ec
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
Tony Nguyen
 

Similar to CRDTs and Redis (20)

R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check
 
I04124052057
I04124052057I04124052057
I04124052057
 
Constraint Programming in Haskell
Constraint Programming in HaskellConstraint Programming in Haskell
Constraint Programming in Haskell
 
DESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGIC
DESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGICDESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGIC
DESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGIC
 
slides.07.pptx
slides.07.pptxslides.07.pptx
slides.07.pptx
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
Prpagation of Error Bounds Across reduction interfaces
Prpagation of Error Bounds Across reduction interfacesPrpagation of Error Bounds Across reduction interfaces
Prpagation of Error Bounds Across reduction interfaces
 
AbdoSummerANS_mod3
AbdoSummerANS_mod3AbdoSummerANS_mod3
AbdoSummerANS_mod3
 
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by OraclesEfficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Planning Under Uncertainty With Markov Decision Processes
Planning Under Uncertainty With Markov Decision ProcessesPlanning Under Uncertainty With Markov Decision Processes
Planning Under Uncertainty With Markov Decision Processes
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 

Recently uploaded

If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
Sheetaleventcompany
 

Recently uploaded (20)

Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 

CRDTs and Redis

  • 1. CRDTs and Redis From sequential to concurrent executions Carlos Baquero HASLab, INESC TEC & Univ. Minho, Portugal April 26th 2018 1/32
  • 2. 2/32 The speed of communication in the 19th century W. H. Harrison’s death “At 12:30 am on April 4th, 1841 President William Henry Harrison died of pneumonia just a month after taking office. The Rich- mond Enquirer published the news of his death two days later on April 6th. The North- Carolina standard newspaper published it on April 14th. His death wasn’t known of in Los Angeles until July 23rd, 110 days after it had occurred.” Text by Zack Bloom, A Quick History of Digital Communication Before the Internet. https://eager.io/blog/communication-pre-internet/ Picture by By Albert Sands Southworth and Josiah Johnson Hawes
  • 3. 3/32 The speed of communication in the 19th century Francis Galton Isochronic Map
  • 4. 4/32 The speed of communication in the 21st century RTT data gathered via http://www.azurespeed.com
  • 5. 5/32 The speed of communication in the 21st century If you really like high latencies . . . Time delay between Mars and Earth blogs.esa.int/mex/2012/08/05/time-delay-between-mars-and-earth/ Delay/Disruption Tolerant Networking www.nasa.gov/content/dtn
  • 6. 6/32 Latency magnitudes Geo-replication λ, up to 50ms (local region DC) Λ, between 100ms and 300ms (inter-continental) No inter-DC replication Client writes observe λ latency Planet-wide geo-replication Replication techniques versus client side write latency ranges Consensus/Paxos [Λ, 2Λ] (with no divergence) Primary-Backup [λ, Λ] (asynchronous/lazy) Multi-Master λ (allowing divergence)
  • 7. 7/32 EC and CAP for Geo-Replication Eventually Consistent. CACM 2009, Werner Vogels In an ideal world there would be only one consistency model: when an update is made all observers would see that update. Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. CAP theorem. PODC 2000, Eric Brewer Of three properties of shared-data systems – data consistency, system availability, and tolerance to network partition – only two can be achieved at any given time. CRDTs provide support for partition-tolerant high availability
  • 8. 8/32 From sequential to concurrent executions Consensus provides illusion of a single replica This also preserves (slow) sequential behaviour Sequential execution Ops O o // p // q Time // We have an ordered set (O, <). O = {o, p, q} and o < p < q
  • 9. 8/32 From sequential to concurrent executions Consensus provides illusion of a single replica This also preserves (slow) sequential behaviour Sequential execution Ops O o // p // q Time // We have an ordered set (O, <). O = {o, p, q} and o < p < q
  • 10. 9/32 From sequential to concurrent executions EC Multi-master (or active-active) can expose concurrency Concurrent execution p // q Ops O o ?? r s 77 Time // Partially ordered set (O, ). o p q r and o s r Some ops in O are concurrent: p s and q s
  • 11. 10/32 Design of Conflict-Free Replicated Data Types A partially ordered log (polog) of operations implements any CRDT Replicas keep increasing local views of an evolving distributed polog Any query, at replica i, can be expressed from local polog Oi Example: Counter at i is |{inc | inc ∈ Oi }| − |{dec | dec ∈ Oi }| CRDTs are efficient representations that follow some general rules
  • 12. 10/32 Design of Conflict-Free Replicated Data Types A partially ordered log (polog) of operations implements any CRDT Replicas keep increasing local views of an evolving distributed polog Any query, at replica i, can be expressed from local polog Oi Example: Counter at i is |{inc | inc ∈ Oi }| − |{dec | dec ∈ Oi }| CRDTs are efficient representations that follow some general rules
  • 13. 10/32 Design of Conflict-Free Replicated Data Types A partially ordered log (polog) of operations implements any CRDT Replicas keep increasing local views of an evolving distributed polog Any query, at replica i, can be expressed from local polog Oi Example: Counter at i is |{inc | inc ∈ Oi }| − |{dec | dec ∈ Oi }| CRDTs are efficient representations that follow some general rules
  • 14. 11/32 Principle of permutation equivalence If operations in sequence can commute, preserving a given result, then under concurrency they should preserve the same result Sequential inc(10) // inc(35) // dec(5) // inc(2) dec(5) // inc(2) // inc(10) // inc(35) Concurrent inc(35) inc(10) 88 inc(2) dec(5) 88 You guessed: Result is 42
  • 15. 12/32 Implementing Counters Example: CRDT PNCounters A inc(35) B inc(10) 88 inc(2) C dec(5) 88 Lets track total number of incs and decs done at each replica {A(incs, decs), . . . , C(. . . , . . .)}
  • 16. 13/32 Implementing Counters Example: CRDT PNCounters Separate positive and negative counts are kept per replica A {A(35, 0), B(10, 0)} ++ B {B(10, 0)} 66 (( {A(35, 0), B(12, 0), C(0, 5)} C {B(10, 0), C(0, 5)} 33 Joining does point-wise maximums among entries (semilattice) At any time, counter value is sum of incs minus sum of decs
  • 17. 13/32 Implementing Counters Example: CRDT PNCounters Separate positive and negative counts are kept per replica A {A(35, 0), B(10, 0)} ++ B {B(10, 0)} 66 (( {A(35, 0), B(12, 0), C(0, 5)} C {B(10, 0), C(0, 5)} 33 Joining does point-wise maximums among entries (semilattice) At any time, counter value is sum of incs minus sum of decs
  • 18. 14/32 Implementing Counters Redis CRDT Counters There are multiple ways to implement CRDT counters Redis has a distinct implementation that favours garbage collection Redis CRDT counters are 59 bits (not 64) to avoid overflows
  • 19. 15/32 Registers Registers are an ordered set of write operations Sequential execution A wr(x) // wr(j) // wr(k) // wr(x) Sequential execution under distribution A wr(x) %% wr(x) B wr(j) // wr(k) 99 Register value is x, the last written value
  • 20. 16/32 Implementing Registers Naive Last-Writer-Wins CRDT register implemented by attaching local wall-clock times Sequential execution under distribution A (11:00)x '' (11:30)? ## B (12:02)j // (12:05)k 77 ? Problem: Wall-clock on B is one hour ahead of A Value x might not be writeable again at A since 12:05 11:30
  • 21. 17/32 Registers Sequential Semantics Register shows value v at replica i iff wr(v) ∈ Oi and wr(v) ∈ Oi · wr(v) wr(v )
  • 22. 18/32 Preservation of sequential semantics Concurrent semantics should preserve the sequential semantics This also ensures correct sequential execution under distribution
  • 23. 19/32 Multi-value Registers Concurrency semantics shows all concurrent values {v | wr(v) ∈ Oi ∧ wr(v ) ∈ Oi · wr(v) wr(v )} Concurrent execution A wr(x) %% // wr(y) // {y, k} // wr(m) // {m} B wr(j) // wr(k) 99 Dynamo shopping carts are multi-value registers with payload sets The m value could be an application level merge of values y and k
  • 24. 20/32 Implementing Multi-value Registers Concurrency can be preciselly tracked with version vectors Concurrent execution (version vectors) A [1, 0]x %% // [2, 0]y // [2, 0]y, [1, 2]k // [3, 2]m B [1, 1]j // [1, 2]k 66 Metadata can be compressed with a common causal context and a single scalar per value (dotted version vectors)
  • 25. 21/32 Registers in Redis LWW arbitration Multi-value registers allows executions leading to concurrent values Presenting concurrent values is at odds with the sequential API Redis both tracks causality and registers wall-clock times Querying uses Last-Writer-Wins selection among concurrent values This preserves correctness of sequential semantics A value with clock 12:05 can still be causally overwritten at 11:30
  • 26. 22/32 Sets Sequential Semantics Consider add and rmv operations X = {. . .}, add(a) −→ add(c) we observe that a, c ∈ X X = {. . .}, add(c) −→ rmv(c) we observe that c ∈ X In general, given Oi , the set has elements {e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)}
  • 27. 22/32 Sets Sequential Semantics Consider add and rmv operations X = {. . .}, add(a) −→ add(c) we observe that a, c ∈ X X = {. . .}, add(c) −→ rmv(c) we observe that c ∈ X In general, given Oi , the set has elements {e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)}
  • 28. 22/32 Sets Sequential Semantics Consider add and rmv operations X = {. . .}, add(a) −→ add(c) we observe that a, c ∈ X X = {. . .}, add(c) −→ rmv(c) we observe that c ∈ X In general, given Oi , the set has elements {e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)}
  • 29. 22/32 Sets Sequential Semantics Consider add and rmv operations X = {. . .}, add(a) −→ add(c) we observe that a, c ∈ X X = {. . .}, add(c) −→ rmv(c) we observe that c ∈ X In general, given Oi , the set has elements {e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)}
  • 30. 22/32 Sets Sequential Semantics Consider add and rmv operations X = {. . .}, add(a) −→ add(c) we observe that a, c ∈ X X = {. . .}, add(c) −→ rmv(c) we observe that c ∈ X In general, given Oi , the set has elements {e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)}
  • 31. 23/32 Sets Concurrency Semantics Problem: Concurrently adding and removing the same element Concurrent execution A add(x) %% // rmv(x) // {?} // add(x) // {x} B rmv(x) // add(x) ::
  • 32. 24/32 Concurrency Semantics Add-Wins Sets Let’s choose Add-Wins Consider a set of known operations Oi , at node i, that is ordered by an happens-before partial order . Set has elements {e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)} Is this familiar? The sequential semantics applies identical rules on a total order Redis CRDT sets are Add-Wins Sets
  • 33. 24/32 Concurrency Semantics Add-Wins Sets Let’s choose Add-Wins Consider a set of known operations Oi , at node i, that is ordered by an happens-before partial order . Set has elements {e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)} Is this familiar? The sequential semantics applies identical rules on a total order Redis CRDT sets are Add-Wins Sets
  • 34. 24/32 Concurrency Semantics Add-Wins Sets Let’s choose Add-Wins Consider a set of known operations Oi , at node i, that is ordered by an happens-before partial order . Set has elements {e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)} Is this familiar? The sequential semantics applies identical rules on a total order Redis CRDT sets are Add-Wins Sets
  • 35. 24/32 Concurrency Semantics Add-Wins Sets Let’s choose Add-Wins Consider a set of known operations Oi , at node i, that is ordered by an happens-before partial order . Set has elements {e | add(e) ∈ Oi ∧ rmv(e) ∈ Oi · add(e) rmv(e)} Is this familiar? The sequential semantics applies identical rules on a total order Redis CRDT sets are Add-Wins Sets
  • 36. 25/32 Equivalence to a sequential execution? Add-Wins Sets Can we always explain a concurrent execution by a sequential one? Concurrent execution A {x, y} // add(y) // rmv(x) // {y} // ## {x, y} B {x, y} // add(x) // rmv(y) // {x} // ;; {x, y} Two (failed) sequential explanations H1 {x, y} // . . . // rmv(x) // { x, y} H2 {x, y} // . . . // rmv(y) // {x, y} Concurrent executions can have richer outcomes
  • 37. 26/32 Concurrency Semantics Remove-Wins Sets Alternative: Let’s choose Remove-Wins Xi . = {e | add(e) ∈ Oi ∧ ∀ rmv(e) ∈ Oi · rmv(e) add(e)} Remove-Wins requires more metadata than Add-Wins Both Add and Remove-Wins have same semantics in a total order They are different but both preserve sequential semantics
  • 38. 26/32 Concurrency Semantics Remove-Wins Sets Alternative: Let’s choose Remove-Wins Xi . = {e | add(e) ∈ Oi ∧ ∀ rmv(e) ∈ Oi · rmv(e) add(e)} Remove-Wins requires more metadata than Add-Wins Both Add and Remove-Wins have same semantics in a total order They are different but both preserve sequential semantics
  • 39. 27/32 Choice of semantics Design freedom is limited by preservation of sequential semantics Delaying choice of semantics to query time A CRDT Set data type could store enough information to allow a parametrized query that shows either Add-Wins or Remove-Wins Open question: Reason for an extended concurrency aware API?
  • 40. 28/32 Sequence/List Weak/Strong Specification [Attiya et al, PODC 16] Element x is kept rpush(b) // xb $$ // lpush(x) // x 99 %% axb lpush(a) // ax :: Element x is removed (Redis enforces Strong Specification) rpush(b) // xb ## // lpush(x) // x :: $$ // rem(x) // // ab ¬ ba lpush(a) // ax ;;
  • 41. 29/32 Take home message Concurrent executions are needed to deal with latency Behaviour changes when moving from sequential to concurrent Road to accommodate transition: Permutation equivalence Preserving sequential semantics Concurrent executions lead to richer outcomes CRDTs provide sound guidelines and encode policies
  • 42. 30/32 Thanks and Questions Thanks to Yiftach, Yuval, Yossi, Meir, Cihan for their inputs, and my colleagues for early feedback Glad to address any questions Carlos Baquero, cbm@di.uminho.pt, @xmal #Redis #RedisConf
  • 43. 31/32 Causal Consistency Redis CRDTs provide per-key causal consistency Source FIFO (TCP) A apple %% B apple pie ## hot ## C pie hot peach? apple Causal consistency A apple %% B apple %% pie $$ hot C apple pie hot tasty Strongest highly available consistency model
  • 44. 32/32 Causal Consistency Redis CRDTs provide per-key causal consistency Source FIFO (TCP) A add(a) '' B add(a) rmv(a) '' add(b) '' C rmv(a) add(b) add(a) Causal consistency A add(a) %% B add(a) %% rmv(a) add(b) %% C add(a) rmv(a) add(b) Strongest highly available consistency model