Constructing Distributed Doubly Linked Lists without Distributed Locking

Constructing 
Distributed Doubly Linked List 
without Distributed Locking
IEEE Peer-to-Peer Conference 2015
Sep 23rd–24th, 2015
Kota Abe, Osaka City University / NICT, Japan
Mikio Yoshida, BBR Inc., Japan
1

Outline
Background
What is distributed doubly linked list
Conventional approaches
The DDLL algorithm
Procedure for node insertion, deletion and traversal
Procedure for recovery from failure
Evaluation
Comparison with conventional algorithms
Conclusion
2

Outline
Background
The DDLL algorithm
Evaluation
Conclusion
3

Distributed Doubly Linked List
aka Bidirectional Ring
Commonly used in structured 
P2P networks
Chord, Chord#, Skip Graph,  
SkipNet, etc.
Structure
Pointer (e.g. IP address) to the next (successor) node
and previous (predecessor) node
We call right and left pointers
Sorted by node-speciﬁc key
Circular
4
0
2060
40
70 10
50 30

Maintaining Distributed Doubly Linked List
Challenges
Nodes are distributed and may be simultaneously
and independently inserted and deleted
Nodes may fail
5
u
p q
u
p q
Insertion Deletion
up q
Recovery
p q r
Traversal

u
p q
Conventional Approaches (1/2)
Eventual Consistency
Approach
Node insertion and deletion
temporarily breaks the list
structure
Stabilizing procedure recovers
6
p q
u
u
p q
Distributed Locking Approach
Use a lock🔒 to mutually exclude
node insertion / deletion
u
p q
JoinDone
JoinPoint
NewSuccAck
🔒
🔓
🔒
🔓
NewSucc
JoinReq
Chord
Atomic Ring Maintenance (Ghodsi)
u
p q

Conventional Approach (2/2)
Eventual Consistency
Approach
Pros 👍
Easy to recover from
failure
Cons 👎
No lookup consistency:
Lookup results may differ
depending on the querying
node
Distributed Locking Approach
Pros 👍
Lookup consistency
Cons 👎
Lock disturbs another
node insertion / deletion
When a node fails, locking
duration may be quite long
Recovery procedure is
rather complicated
Release a lock by timeout,
which may be premature
→ locks should not be used  
if possible

Outline
Background
The DDLL algorithm
Procedure for node insertion, deletion and
traversal
Evaluation
Conclusion
8

Our Contribution — DDLL Algorithm
DDLL = Distributed algorithm for constructing
distributed doubly linked lists
Acronym of “Distributed Doubly Linked List”
Guarantees lookup consistency without using
distributed locking (in absence of failure)
Simple and Efﬁcient
Proved correctness (insertion and deletion procedure)
Practical
Works with non-FIFO channels (e.g. UDP)
Used in our PIAX P2P platform as a foundation of Skip
Graph and Chord# implementations
9

Node Insertion
10
u
p q
u
p q
u
p q
u
p q
(1) u.l := p, u.r := q
(2) Update right link:
Change p’s right link to u
(3) Update left link:
Change q’s left link to u
u is going to be inserted 
between p and q

Updating Right Link (1/3)
11
u
p q
v
u
qo
p has been deleted
We want to change p’s
right link only if 
there is no conﬂict
u
p r
q has been deleted
q
p
Conﬂicts
another node has been
inserted between p and q

SetR message is used for updating a right link
SetR message contains:
new right node
expected right node of the recipient node
When a SetR request is accepted, p returns a SetRAck message
Otherwise, p returns SetRNak message
12
u
p q
u
p q
SetR(u, q)
Please change your right link to me (u)

if your right link still points to q and

you has not initiated deletion
SetRAck
Ok!

Right links are always correct without using locking
13
u
p q
v
another node has been
inserted between p and q
SetR(u, q)
p.r != q
Conﬂict case example:
u
p q
v
SetRNak
Sorry!

Updating Left Link (1/3)
p q
p q
uSetR(u, q)
Message Sequence
14
u
p q
u
v
u v
SetL(v)SetRAck
SetL(u)
SetRAck
p q
SetR(v, q)
Problem:
Multiple SetL messages arrive from different nodes 
in arbitrary order (because we do not want to use locking)
Node must determine which SetL message is newer
!?
p q
Topology Change
v

Solution:
SetL message contains a
sequence number (seq)
Each node holds a sequence
number for its right node (rseq)
rseq is transferred using
SetRAck
Each node holds the max
sequence number of SetL
messages received so far (lseq)
SetL message is accepted only
if msg.seq > lseq
15
p q
rseq = 0
lseq = 0
u
p q
rseq = 1SetRAck(1)
lseq = 0
SetL(u, 1)
u
p q
rseq = 1
lseq = 1
u
p q
rseq = 2
v
lseq = 2
u
p q
rseq = 2
v
SetL(u, 2)
lseq = 1
SetRAck(2)

p q
uSetR(u, q, 0)
Message Sequence
16
u
p q
u
v
v
SetL(v, 2)
SetRAck(2)
SetL(u, 1)
p q
SetR(v, q)
How our scheme solves the previous case
p q
0
0
SetRAck(1)
0
0
1
0
0
0
2
0
0
2
This SetL message 
is staled and ignored
Topology Change
Lock is not necessary !
lseq = 0
lseq = 2
rseq = 0
rseq = 1
rseq = 2

Node Insertion Sequence
u
p q
p q
i
u
p q
i
0
0
i
u
SetR(u, q, 0)
SetRAck(i+1)
SetL(u, i+1)
Message Sequence
17
Topology Change
qp
0
0
i+1
i+1

Node Deletion Sequence
u
p q
u
p q up q
SetR(q, u, i2+1)
SetRAck(i1+1)
SetL(p, i2+1)
Message Sequence
18
Topology Change
u
p q
i2 + 1
i2 + 1
i2
i2
i1
i1
i2 + 1
i2
i1+1 is not used

Insertion and Deletion
3 messages are required for insertion/deletion
A node is atomically inserted/deleted when SetR
message is accepted
If SetRNak message is received, application
retries insertion/deletion
Right links are always correct
Left links are correct when there is no SetL
message in transmission
No distributed locking
Does not require FIFO channel (UDP friendly)
19

Traversals
Every inserted node can be
looked up either rightward or
leftward
Traversing rightward: easy
Traversing leftward:
left links are not always correct
1. Node X visits q and fetches 
q.l (= p)
2. X visits p and fetches p.l  
and p.r (= u)
3. X detects that u is missed 
(because p.r != q) and X visits u
20
u
p q
X
1.visit
2.visit
Incorrect left link
3.visit
traversing leftward

Insertion Retry Optimization
Insertion requires pointers to the immediate left and right nodes
When an inserting node receives SetRNak, the node retries
Optimization: SetRNak contains the pointer to the right node
Extra messages can be eliminated 
if p is not initiated deletion AND u ∈ (p, p.r)
2121
qp
vu SetR
SetRAck
SetL
qp
vu SetR
SetRAck
SetL
SetRNak
MyR(v)
GetR
SetRAck
SetL
SetRAck
SetL
Unoptimized
SetRNak(v)
SetR(u, v)
OptimizedSetR
SetR
SetR(u, v)

Handling failure
So far, no failure is assumed
DDLL algorithm considers:
Crash failure
Omission failure
Timing failure
In asynchronous network, it is impossible to
distinguish slow nodes and failed nodes
Erroneously suspected nodes are temporarily
removed but eventually recovered
22
}Omitted in this presentation

Recovery | Basic
Each node maintains a
neighbor node set N
N contains sufﬁcient number of
left-side nodes
Each node u periodically ﬁnds
live closest left-side node v
u obtains v.r and v.rseq
If (v = u.l) ∧ (v.r = u) 
∧ (v.rseq = u.lseq) then OK
23
A C
A C
A C
?
?BA C
rseq uv
lseq
uv
Otherwise, start recovery
B
B
B
SetR(C, B, ?)

Recovery | Sequence Number (1)
Let’s consider the
sequence number of
the recovered link
24
A C
A C
A C
i
i
i +1
i +1
i +1
B
B
B
SetR(C, B, i+1)
Assigning C.lseq + 1 ?
A C
A C
?
?
B
B
SetR(C, B, ?)

Both A and X have
the same right node
(C) and the same
rseq (i +1)
25
A X
i +1
i
C
A
X
C
A
X
C
SetL
SetL
i +1
i +1
i +1
i +1
i +1
B
B
B
SetR(C, B, i +1)
C’s left link may rollback !
A X
i +1
CSetL
BX inserts between 
B and C
B fails while SetL 
to C is still in
transmission
C starts recovery 
w/o noticing X
Subtle Case

Solution:
Extend
sequence
number:
(recovery-
number, seq)
Recovery
number is
increased only
on recovery
Left links do
not rollback!
26
A X
(0, i +1)
(0, i)
C
A
X
C
A
X
C
SetL
SetL
(1, 0) (0, i +1)
(1, 0)
BA
(0, i)
(0, i)
C
B
B
B
SetR(C, B, (1, 0))
(0, i +1)

Outline
Background
The DDLL algorithm
Evaluation
Conclusion
27

Evaluation
Comparison
DDLL(without optimization)
DDLL(with optimization)
Atomic Ring Maintenance (distributed-locking)
A. Ghodsi, “Distributed k-ary System: Algorithms for distributed hash
tables,” PhD Dissertation, KTH—Royal Institute of Technology, 2006.
Li’s algorithm (distributed locking, no ﬁnger table)
X. Li, et. al., “Concurrent maintenance of rings.” Distributed Comp., vol. 19,
no. 2, pp. 126–148, 2006.
Chord (eventual consistency, no ﬁnger table)
I. Stoica, et. al., “Chord: A scalable peer-to-peer lookup protocol for internet
applications,” IEEE/ACM Trans. on Net., vol. 11, no. 1, pp. 17–32, 2003.
28

Eval | Insertion Sequence
29
u
p q
Join(u)
Ack(p, q)
Grant(u)
🔒
🔓
🔒
🔓
Li’s
Done
u
p q
JoinReq
JoinDone
JoinPoint
NewSuccAck
🔒
🔓
🔒
🔓
Atomic Ring Maintenance
NewSucc
DDLL
qp
SetLSetRAck
u
SetR

Eval | Time for Concurrent Insertion
Simulated on a
discrete event
simulator
Insert an initial node
Insert n nodes in
parallel 
(n = 1 to 100)
Measured time required
to converge all links
Time includes lookup
messages for
searching node
insertion position
30
0
20
40
60
80
100
120
0 20 40 60 80 100
time
# of simultaneously inserting nodes
DDLL(Opt)
DDLL(NoOpt)
Atomic
Li's
Chord
DDLL(Opt) converges quickly
Time to converge
time unit = one-way message 
transmission time

Eval | # of Msgs for Concurrent Insertion
31
0
1
2
3
4
5
0 20 40 60 80 100
#ofmessages(x1000)
# of simultaneously inserting nodes
DDLL(Opt)
DDLL(NoOpt)
Atomic
Li's
Chord
# of messages to convergeMeasured # of
messages
required to
converge all links
DDLL(Opt) uses less messages

Outline
Background
The DDLL algorithm
Evaluation
Conclusion
32

Conclusion
DDLL algorithm for constructing distributed doubly linked
lists
No distributed locking
Right links are always correct, Left links converge quickly
Maintains lookup consistency (in absence of failure)
More efﬁcient than conventional algorithms
Recovery procedure is provided
No FIFO channel is required
Correctness proofs for insertion and deletion procedure
DDLL is suitable for ring-based structured P2P networks
Real example: DDLL is used as a foundation of Skip Graph
and Chord# implementations in PIAX P2P platform
33

X is excluded
from the linked
list but
eventually
returns
35
BA
X
C
(1, 0) (0, i +1)
(1, 0)
BA
X
C
(0, i +1)
(1, 0)
SetR(X, C, (0, 0))
BA
X
C
(0, 0) (1, 1)
(1, 0)
(1, 0)
BA
X
C
(0, i +1)
(1, 0)
(0, 0)
SetRAck((1,1))
(0, 0)

DDLL pseudo code
36
12 i f ( s ̸= out ∨ u ̸∈ (p, q)) then error ; f i
13 l , r , s := p , q , i n s
14 send SetR ( u , r , lseq ) to l
15 {Delete}
16 [ ] (A3 ) r e c e i v e Delete ( ) from app →
17 i f ( s ̸= in ) then error
18 e l s e i f ( u = r ) then {in case of the l a s t node}
19 s := out
20 e l s e s := del ; send SetR ( r , u , rseq + 1) to l ; f i
21 [ ] (A4 ) r e c e i v e SetR (rnew , rcur , rnewseq ) from v →
22 i f ( s = in ∧ r = rcur ) then
23 i f (rnew = v) then { i n s e r t i o n case}
24 send SetL (rnew , rseq + 1) to r
25 e l s e { d e l e t i o n case}
26 send SetL ( u , rnewseq ) to rnew ; f i
27 send SetRAck (rseq + 1) to v
28 r , rseq := rnew , rnewseq
29 e l s e send SetRNak ( ) to v ; f i
30 [ ] (A5 ) r e c e i v e SetRAck (rnewseq ) from v →
31 i f ( s = i n s ) then
32 s , rseq := in , rnewseq
33 e l s e i f ( s = del ) then
34 s := out ; f i
35 [ ] (A6 ) r e c e i v e SetRNak ( ) from v →
37 s := out ; error {app r e t r i e s i n s e r t i o n l a t e r }
39 s := in ; error ; f i {app r e t r i e s d e l e t i o n l a t e r }
40 [ ] (A7 ) r e c e i v e SetL (lnew , seq) from v →
41 i f (lseq< seq) then l , lseq := lnew , seq ; f i
42 end
Fig. 1: DDLL algorithm (without optimization)
are executed.
(A2) u sets u’s left link and right link to p and
h
n
m
t
i
s
f
u
m
s
i
f
b
f
p
s
c
u
m
a
B
1 process u
2 var s : {out , ins , in , del}
3 l , r : {p o i n t e r to a node or n i l }
4 lseq , rseq : { i n t e g e r or n i l }
5 i n i t s = out ; l = r = n i l ; lseq = 0; rseq = n i l
6 begin
7 {Create a l i n k e d l i s t }
8 (A1 ) r e c e i v e Create ( ) from app →
9 l , r , s , lseq , rseq := u , u , in , 0 , 0
10 { I n s e r t between p and q}
11 [ ] (A2 ) r e c e i v e I n s e r t ( p , q ) from app →
12 i f ( s ̸= out ∨ u ̸∈ (p, q)) then error ; f i
13 l , r , s := p , q , i n s
14 send SetR ( u , r , lseq ) to l
15 {Delete}
16 [ ] (A3 ) r e c e i v e Delete ( ) from app →
17 i f ( s ̸= in ) then error
18 e l s e i f ( u = r ) then {in case of the l a s t node}
19 s := out
20 e l s e s := del ; send SetR ( r , u , rseq + 1) to l ; f i
21 [ ] (A4 ) r e c e i v e SetR (rnew , rcur , rnewseq ) from v →
22 i f ( s = in ∧ r = rcur ) then
23 i f (rnew = v) then { i n s e r t i o n case}
24 send SetL (rnew , rseq + 1) to r
25 e l s e { d e l e t i o n case}
26 send SetL ( u , rnewseq ) to rnew ; f i
27 send SetRAck (rseq + 1) to v
28 r , rseq := rnew , rnewseq
29 e l s e send SetRNak ( ) to v ; f i
30 [ ] (A5 ) r e c e i v e SetRAck (rnewseq ) from v →
32 s , rseq := in , rnewseq
u (as the new left node) and p.rseq +1(= i+1) (as the
sequence number of the SetL message). Next, p sends
a SetRAck message to u to notify that the insertion
was successful. Because left(q) is changed from p to u,
the incremented right sequence number for q should be
transferred from p to u. For this purpose, the SetRAck
message contains p.rseq +1(= i+1). Finally, p changes
p.r to u and p.rseq to 0 (rnewseq). Because u’s right link
has already been set to q, the rightward linked list is
never interrupted, even for a moment. Note that at this
moment, p.rseq = u.lseq holds.
(A5) On receiving the SetRAck message, u conﬁrms
that u is successfully inserted. Node u updates u.s to
in to indicate that u is inserted, and sets u.rseq to i+1.
(A7) On receiving the SetL message, q compares the
sequence number of the SetL message with q.lseq. If the
former is larger (we assume this case), q updates q.l to
u and q.lseq to i+1. Otherwise, q ignores the message.
In the scenario above, it is assumed that a SetRAck
message is sent to u in A4. If a SetRNak message is
sent (i.e., in the case of insertion failure), then (A6) u.s

Constructing Distributed Doubly Linked Lists without Distributed Locking

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Constructing Distributed Doubly Linked Lists without Distributed Locking

Similar to Constructing Distributed Doubly Linked Lists without Distributed Locking (20)

More from Kota Abe

More from Kota Abe (16)

Recently uploaded

Recently uploaded (20)

Constructing Distributed Doubly Linked Lists without Distributed Locking