Explains a novel distributed algorithm for constructing distributed doubly linked lists (or bidirectional ring), which are common in structured P2P networks.
This presentation is used at the 2015 IEEE International Conference on Peer-to-Peer Computing (P2P2015).
The paper is available at
<http://www.media.osaka-cu.ac.jp/~k-abe/research/Constructing_Distributed_Doubly_Linked_Lists_without_Distributed_Locking.html>
Author: Kota Abe (Osaka City University/NICT), Mikio Yoshida (BBR Inc.)
Abstract:
A distributed doubly linked list (or bidirectional ring) is a fundamental distributed data structure commonly used in structured peer-to-peer networks. This paper presents DDLL, a novel decentralized algorithm for constructing distributed doubly linked lists. In the absence of failure, DDLL maintains consistency with regard to lookups of nodes, even while multiple nodes are simultaneously being inserted or deleted. Unlike existing algorithms, DDLL adopts a novel strategy based on conflict detection and sequence numbers. A formal description and correctness proofs are given. Simulation results show that DDLL outperforms conventional algorithms in terms of both time and number of messages.
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Constructing Distributed Doubly Linked Lists without Distributed Locking
1. Constructing
Distributed Doubly Linked List
without Distributed Locking
IEEE Peer-to-Peer Conference 2015
Sep 23rd–24th, 2015
Kota Abe, Osaka City University / NICT, Japan
Mikio Yoshida, BBR Inc., Japan
1
2. Outline
Background
What is distributed doubly linked list
Conventional approaches
The DDLL algorithm
Procedure for node insertion, deletion and traversal
Procedure for recovery from failure
Evaluation
Comparison with conventional algorithms
Conclusion
2
3. Outline
Background
What is distributed doubly linked list
Conventional approaches
The DDLL algorithm
Procedure for node insertion, deletion and traversal
Procedure for recovery from failure
Evaluation
Comparison with conventional algorithms
Conclusion
3
4. Distributed Doubly Linked List
aka Bidirectional Ring
Commonly used in structured
P2P networks
Chord, Chord#, Skip Graph,
SkipNet, etc.
Structure
Pointer (e.g. IP address) to the next (successor) node
and previous (predecessor) node
We call right and left pointers
Sorted by node-specific key
Circular
4
0
2060
40
70 10
50 30
5. Maintaining Distributed Doubly Linked List
Challenges
Nodes are distributed and may be simultaneously
and independently inserted and deleted
Nodes may fail
5
u
p q
u
p q
Insertion Deletion
up q
Recovery
p q r
Traversal
6. u
p q
Conventional Approaches (1/2)
Eventual Consistency
Approach
Node insertion and deletion
temporarily breaks the list
structure
Stabilizing procedure recovers
6
p q
u
u
p q
Distributed Locking Approach
Use a lock🔒 to mutually exclude
node insertion / deletion
u
p q
JoinDone
JoinPoint
NewSuccAck
🔒
🔓
🔒
🔓
NewSucc
JoinReq
Chord
Atomic Ring Maintenance (Ghodsi)
u
p q
7. Conventional Approach (2/2)
Eventual Consistency
Approach
Pros 👍
Easy to recover from
failure
Cons 👎
No lookup consistency:
Lookup results may differ
depending on the querying
node
Distributed Locking Approach
Pros 👍
Lookup consistency
Cons 👎
Lock disturbs another
node insertion / deletion
When a node fails, locking
duration may be quite long
Recovery procedure is
rather complicated
Release a lock by timeout,
which may be premature
→ locks should not be used
if possible
8. Outline
Background
What is distributed doubly linked list
Conventional approaches
The DDLL algorithm
Procedure for node insertion, deletion and
traversal
Procedure for recovery from failure
Evaluation
Comparison with conventional algorithms
Conclusion
8
9. Our Contribution — DDLL Algorithm
DDLL = Distributed algorithm for constructing
distributed doubly linked lists
Acronym of “Distributed Doubly Linked List”
Guarantees lookup consistency without using
distributed locking (in absence of failure)
Simple and Efficient
Proved correctness (insertion and deletion procedure)
Practical
Works with non-FIFO channels (e.g. UDP)
Used in our PIAX P2P platform as a foundation of Skip
Graph and Chord# implementations
9
10. Node Insertion
10
u
p q
u
p q
u
p q
u
p q
(1) u.l := p, u.r := q
(2) Update right link:
Change p’s right link to u
(3) Update left link:
Change q’s left link to u
u is going to be inserted
between p and q
11. Updating Right Link (1/3)
11
u
p q
v
u
qo
p has been deleted
We want to change p’s
right link only if
there is no conflict
u
p r
q has been deleted
q
p
Conflicts
another node has been
inserted between p and q
12. SetR message is used for updating a right link
SetR message contains:
new right node
expected right node of the recipient node
When a SetR request is accepted, p returns a SetRAck message
Otherwise, p returns SetRNak message
Updating Right Link (2/3)
12
u
p q
u
p q
SetR(u, q)
Please change your right link to me (u)
if your right link still points to q and
you has not initiated deletion
SetRAck
Ok!
13. Right links are always correct without using locking
Updating Right Link (3/3)
13
u
p q
v
another node has been
inserted between p and q
SetR(u, q)
p.r != q
Conflict case example:
u
p q
v
SetRNak
Sorry!
14. Updating Left Link (1/3)
p q
p q
uSetR(u, q)
Message Sequence
14
u
p q
u
v
u v
SetL(v)SetRAck
SetL(u)
SetRAck
p q
SetR(v, q)
Problem:
Multiple SetL messages arrive from different nodes
in arbitrary order (because we do not want to use locking)
Node must determine which SetL message is newer
!?
p q
Topology Change
v
15. Updating Left Link (2/3)
Solution:
SetL message contains a
sequence number (seq)
Each node holds a sequence
number for its right node (rseq)
rseq is transferred using
SetRAck
Each node holds the max
sequence number of SetL
messages received so far (lseq)
SetL message is accepted only
if msg.seq > lseq
15
p q
rseq = 0
lseq = 0
u
p q
rseq = 1SetRAck(1)
lseq = 0
SetL(u, 1)
u
p q
rseq = 1
lseq = 1
u
p q
rseq = 2
v
lseq = 2
u
p q
rseq = 2
v
SetL(u, 2)
lseq = 1
SetRAck(2)
16. Updating Left Link (3/3)
p q
uSetR(u, q, 0)
Message Sequence
16
u
p q
u
v
v
SetL(v, 2)
SetRAck(2)
SetL(u, 1)
p q
SetR(v, q)
How our scheme solves the previous case
p q
0
0
SetRAck(1)
0
0
1
0
0
0
2
0
0
2
This SetL message
is staled and ignored
Topology Change
Lock is not necessary !
lseq = 0
lseq = 2
rseq = 0
rseq = 1
rseq = 2
17. Node Insertion Sequence
u
p q
p q
i
u
p q
i
0
0
i
u
SetR(u, q, 0)
SetRAck(i+1)
SetL(u, i+1)
Message Sequence
17
Topology Change
qp
0
0
i+1
i+1
18. Node Deletion Sequence
u
p q
u
p q up q
SetR(q, u, i2+1)
SetRAck(i1+1)
SetL(p, i2+1)
Message Sequence
18
Topology Change
u
p q
i2 + 1
i2 + 1
i2
i2
i1
i1
i2 + 1
i2
i1+1 is not used
19. Insertion and Deletion
3 messages are required for insertion/deletion
A node is atomically inserted/deleted when SetR
message is accepted
If SetRNak message is received, application
retries insertion/deletion
Right links are always correct
Left links are correct when there is no SetL
message in transmission
No distributed locking
Does not require FIFO channel (UDP friendly)
19
20. Traversals
Every inserted node can be
looked up either rightward or
leftward
Traversing rightward: easy
Traversing leftward:
left links are not always correct
1. Node X visits q and fetches
q.l (= p)
2. X visits p and fetches p.l
and p.r (= u)
3. X detects that u is missed
(because p.r != q) and X visits u
20
u
p q
X
1.visit
2.visit
Incorrect left link
3.visit
traversing leftward
21. Insertion Retry Optimization
Insertion requires pointers to the immediate left and right nodes
When an inserting node receives SetRNak, the node retries
Optimization: SetRNak contains the pointer to the right node
Extra messages can be eliminated
if p is not initiated deletion AND u ∈ (p, p.r)
2121
qp
vu SetR
SetRAck
SetL
qp
vu SetR
SetRAck
SetL
SetRNak
MyR(v)
GetR
SetRAck
SetL
SetRAck
SetL
Unoptimized
SetRNak(v)
SetR(u, v)
OptimizedSetR
SetR
SetR(u, v)
22. Handling failure
So far, no failure is assumed
DDLL algorithm considers:
Crash failure
Omission failure
Timing failure
In asynchronous network, it is impossible to
distinguish slow nodes and failed nodes
Erroneously suspected nodes are temporarily
removed but eventually recovered
22
}Omitted in this presentation
23. Recovery | Basic
Each node maintains a
neighbor node set N
N contains sufficient number of
left-side nodes
Each node u periodically finds
live closest left-side node v
u obtains v.r and v.rseq
If (v = u.l) ∧ (v.r = u)
∧ (v.rseq = u.lseq) then OK
23
A C
A C
A C
?
?BA C
rseq uv
lseq
uv
Otherwise, start recovery
B
B
B
SetR(C, B, ?)
24. Recovery | Sequence Number (1)
Let’s consider the
sequence number of
the recovered link
24
A C
A C
A C
i
i
i +1
i +1
i +1
B
B
B
SetR(C, B, i+1)
Assigning C.lseq + 1 ?
A C
A C
?
?
B
B
SetR(C, B, ?)
25. Recovery | Sequence Number (2)
Both A and X have
the same right node
(C) and the same
rseq (i +1)
25
A X
i +1
i
C
A
X
C
A
X
C
SetL
SetL
i +1
i +1
i +1
i +1
i +1
B
B
B
SetR(C, B, i +1)
C’s left link may rollback !
A X
i +1
CSetL
BX inserts between
B and C
B fails while SetL
to C is still in
transmission
C starts recovery
w/o noticing X
Subtle Case
26. Recovery | Sequence Number (3)
Solution:
Extend
sequence
number:
(recovery-
number, seq)
Recovery
number is
increased only
on recovery
Left links do
not rollback!
26
A X
(0, i +1)
(0, i)
C
A
X
C
A
X
C
SetL
SetL
(1, 0) (0, i +1)
(1, 0)
BA
(0, i)
(0, i)
C
B
B
B
SetR(C, B, (1, 0))
(0, i +1)
27. Outline
Background
What is distributed doubly linked list
Conventional approaches
The DDLL algorithm
Procedure for node insertion, deletion and traversal
Procedure for recovery from failure
Evaluation
Comparison with conventional algorithms
Conclusion
27
28. Evaluation
Comparison
DDLL(without optimization)
DDLL(with optimization)
Atomic Ring Maintenance (distributed-locking)
A. Ghodsi, “Distributed k-ary System: Algorithms for distributed hash
tables,” PhD Dissertation, KTH—Royal Institute of Technology, 2006.
Li’s algorithm (distributed locking, no finger table)
X. Li, et. al., “Concurrent maintenance of rings.” Distributed Comp., vol. 19,
no. 2, pp. 126–148, 2006.
Chord (eventual consistency, no finger table)
I. Stoica, et. al., “Chord: A scalable peer-to-peer lookup protocol for internet
applications,” IEEE/ACM Trans. on Net., vol. 11, no. 1, pp. 17–32, 2003.
28
29. Eval | Insertion Sequence
29
u
p q
Join(u)
Ack(p, q)
Grant(u)
🔒
🔓
🔒
🔓
Li’s
Done
u
p q
JoinReq
JoinDone
JoinPoint
NewSuccAck
🔒
🔓
🔒
🔓
Atomic Ring Maintenance
NewSucc
DDLL
qp
SetLSetRAck
u
SetR
30. Eval | Time for Concurrent Insertion
Simulated on a
discrete event
simulator
Insert an initial node
Insert n nodes in
parallel
(n = 1 to 100)
Measured time required
to converge all links
Time includes lookup
messages for
searching node
insertion position
30
0
20
40
60
80
100
120
0 20 40 60 80 100
time
# of simultaneously inserting nodes
DDLL(Opt)
DDLL(NoOpt)
Atomic
Li's
Chord
DDLL(Opt) converges quickly
Time to converge
time unit = one-way message
transmission time
31. Eval | # of Msgs for Concurrent Insertion
31
0
1
2
3
4
5
0 20 40 60 80 100
#ofmessages(x1000)
# of simultaneously inserting nodes
DDLL(Opt)
DDLL(NoOpt)
Atomic
Li's
Chord
# of messages to convergeMeasured # of
messages
required to
converge all links
DDLL(Opt) uses less messages
32. Outline
Background
What is distributed doubly linked list
Conventional approaches
The DDLL algorithm
Procedure for node insertion, deletion and traversal
Procedure for recovery from failure
Evaluation
Comparison with conventional algorithms
Conclusion
32
33. Conclusion
DDLL algorithm for constructing distributed doubly linked
lists
No distributed locking
Right links are always correct, Left links converge quickly
Maintains lookup consistency (in absence of failure)
More efficient than conventional algorithms
Recovery procedure is provided
No FIFO channel is required
Correctness proofs for insertion and deletion procedure
DDLL is suitable for ring-based structured P2P networks
Real example: DDLL is used as a foundation of Skip Graph
and Chord# implementations in PIAX P2P platform
33
35. Recovery | Sequence Number (4)
X is excluded
from the linked
list but
eventually
returns
35
BA
X
C
(1, 0) (0, i +1)
(1, 0)
BA
X
C
(0, i +1)
(1, 0)
SetR(X, C, (0, 0))
BA
X
C
(0, 0) (1, 1)
(1, 0)
(1, 0)
BA
X
C
(0, i +1)
(1, 0)
(0, 0)
SetRAck((1,1))
(0, 0)
36. DDLL pseudo code
36
12 i f ( s ̸= out ∨ u ̸∈ (p, q)) then error ; f i
13 l , r , s := p , q , i n s
14 send SetR ( u , r , lseq ) to l
15 {Delete}
16 [ ] (A3 ) r e c e i v e Delete ( ) from app →
17 i f ( s ̸= in ) then error
18 e l s e i f ( u = r ) then {in case of the l a s t node}
19 s := out
20 e l s e s := del ; send SetR ( r , u , rseq + 1) to l ; f i
21 [ ] (A4 ) r e c e i v e SetR (rnew , rcur , rnewseq ) from v →
22 i f ( s = in ∧ r = rcur ) then
23 i f (rnew = v) then { i n s e r t i o n case}
24 send SetL (rnew , rseq + 1) to r
25 e l s e { d e l e t i o n case}
26 send SetL ( u , rnewseq ) to rnew ; f i
27 send SetRAck (rseq + 1) to v
28 r , rseq := rnew , rnewseq
29 e l s e send SetRNak ( ) to v ; f i
30 [ ] (A5 ) r e c e i v e SetRAck (rnewseq ) from v →
31 i f ( s = i n s ) then
32 s , rseq := in , rnewseq
33 e l s e i f ( s = del ) then
34 s := out ; f i
35 [ ] (A6 ) r e c e i v e SetRNak ( ) from v →
36 i f ( s = i n s ) then
37 s := out ; error {app r e t r i e s i n s e r t i o n l a t e r }
38 e l s e i f ( s = del ) then
39 s := in ; error ; f i {app r e t r i e s d e l e t i o n l a t e r }
40 [ ] (A7 ) r e c e i v e SetL (lnew , seq) from v →
41 i f (lseq< seq) then l , lseq := lnew , seq ; f i
42 end
Fig. 1: DDLL algorithm (without optimization)
are executed.
(A2) u sets u’s left link and right link to p and
h
n
m
t
i
s
f
u
m
s
i
f
b
f
p
s
c
u
m
a
B
1 process u
2 var s : {out , ins , in , del}
3 l , r : {p o i n t e r to a node or n i l }
4 lseq , rseq : { i n t e g e r or n i l }
5 i n i t s = out ; l = r = n i l ; lseq = 0; rseq = n i l
6 begin
7 {Create a l i n k e d l i s t }
8 (A1 ) r e c e i v e Create ( ) from app →
9 l , r , s , lseq , rseq := u , u , in , 0 , 0
10 { I n s e r t between p and q}
11 [ ] (A2 ) r e c e i v e I n s e r t ( p , q ) from app →
12 i f ( s ̸= out ∨ u ̸∈ (p, q)) then error ; f i
13 l , r , s := p , q , i n s
14 send SetR ( u , r , lseq ) to l
15 {Delete}
16 [ ] (A3 ) r e c e i v e Delete ( ) from app →
17 i f ( s ̸= in ) then error
18 e l s e i f ( u = r ) then {in case of the l a s t node}
19 s := out
20 e l s e s := del ; send SetR ( r , u , rseq + 1) to l ; f i
21 [ ] (A4 ) r e c e i v e SetR (rnew , rcur , rnewseq ) from v →
22 i f ( s = in ∧ r = rcur ) then
23 i f (rnew = v) then { i n s e r t i o n case}
24 send SetL (rnew , rseq + 1) to r
25 e l s e { d e l e t i o n case}
26 send SetL ( u , rnewseq ) to rnew ; f i
27 send SetRAck (rseq + 1) to v
28 r , rseq := rnew , rnewseq
29 e l s e send SetRNak ( ) to v ; f i
30 [ ] (A5 ) r e c e i v e SetRAck (rnewseq ) from v →
31 i f ( s = i n s ) then
32 s , rseq := in , rnewseq
33 e l s e i f ( s = del ) then
u (as the new left node) and p.rseq +1(= i+1) (as the
sequence number of the SetL message). Next, p sends
a SetRAck message to u to notify that the insertion
was successful. Because left(q) is changed from p to u,
the incremented right sequence number for q should be
transferred from p to u. For this purpose, the SetRAck
message contains p.rseq +1(= i+1). Finally, p changes
p.r to u and p.rseq to 0 (rnewseq). Because u’s right link
has already been set to q, the rightward linked list is
never interrupted, even for a moment. Note that at this
moment, p.rseq = u.lseq holds.
(A5) On receiving the SetRAck message, u confirms
that u is successfully inserted. Node u updates u.s to
in to indicate that u is inserted, and sets u.rseq to i+1.
(A7) On receiving the SetL message, q compares the
sequence number of the SetL message with q.lseq. If the
former is larger (we assume this case), q updates q.l to
u and q.lseq to i+1. Otherwise, q ignores the message.
In the scenario above, it is assumed that a SetRAck
message is sent to u in A4. If a SetRNak message is
sent (i.e., in the case of insertion failure), then (A6) u.s