SlideShare a Scribd company logo
1 of 32
Slides for Chapter 8: 
Coordination and Agreement 
From Coulouris, Dollimore and Kindberg 
Distributed Systems: 
Concepts and Design 
Edition 4, © Pearson Education 2005
Topics 
 A set of processes coordinate their actions. How to 
agree on one or more values 
 Avoid fixed master-salve relationship to avoid single points of failure for 
fixed master. 
 Distributed mutual exclusion for resource 
sharing 
 A collection of process share resources, mutual exclusion is needed to 
prevent interference and ensure consistency. ( critical section) 
 No shared variables or facilities are provided by single local kernel to solve 
it. Require a solution that is based solely on message passing. 
 Important factor to consider while designing 
algorithm is the failure 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Distributed Mutual Exclusion 
 Application level protocol for executing a critical section 
 enter() // enter critical section-block if necessary 
 resrouceAccess() //access shared resoruces 
 exit() //leave critical section-other processes 
may enter 
Essential requirements: 
 ME1: (safety) at most one process may execute in the critical section 
 ME2: (liveness) Request to enter and exit the critical section eventually 
succeed. 
 ME3(ordering) One request to enter the CS happened-before another, then 
entry to the CS is granted in that order. 
ME2 implies freedom from both deadlock and starvation. Starvation involves 
fairness condition. The order in which processes enter the critical section. It 
is not possible to use the request time to order them due to lack of global 
clock. So usually, we use happen-before ordering to order message 
requests. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Performance Evaluation 
 Bandwidth consumption, which is proportional to the 
number of messages sent in each entry and exit 
operations. 
 The client delay incurred by a process at each entry 
and exit operation. 
 throughput of the system. Rate at which the collection 
of processes as a whole can access the critical section. 
Measure the effect using the synchronization delay 
between one process exiting the critical section and the 
next process entering it; the shorter the delay is, the 
greater the throughput is. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Central Server Algorithm 
 The simplest way to grant permission to enter the critical 
section is to employ a server. 
 A process sends a request message to server and 
awaits a reply from it. 
 If a reply constitutes a token signifying the permission to 
enter the critical section. 
 If no other process has the token at the time of the 
request, then the server replied immediately with the 
token. 
 If token is currently held by other processes, the server 
does not reply but queues the request. 
 Client on exiting the critical section, a message is sent to 
server, giving it back the token. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Figure 12.2 
Central Server algorithm: managing a mutual exclusion token for a set 
of processes 
3. Grant 
token 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Server 
Queue of 
requests 
1. Request 
token 
2. Release 
token 
4 
2 
p4 
p 
p 3 2 
p 
1 
ME1: safety 
ME2: liveness 
Are satisfied but not 
ME3: ordering 
Bandwidth: entering takes 
two messages( request 
followed by a grant), 
delayed by the round-trip 
time; exiting takes one 
release message, and does 
not delay the exiting 
process. 
Throughput is measured 
by synchronization delay, 
the round-trip of a release 
message and grant 
message.
Ring-based Algorithm 
 Simplest way to arrange mutual exclusion between N processes 
without requiring an additional process is arrange them in a logical 
ring. 
 Each process pi has a communication channel to the next process 
in the ring, p(i+1)/mod N. 
 The unique token is in the form of a message passed from process 
to process in a single direction clockwise. 
 If a process does not require to enter the CS when it receives the 
token, then it immediately forwards the token to its neighbor. 
 A process requires the token waits until it receives it, but retains it. 
 To exit the critical section, the process sends the token on to its 
neighbor. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Figure 12.3 
A ring of processes transferring a mutual exclusion token 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
pn 
p 
2 
p 
3 
p 
4 
Token 
p 
1 
ME1: safety 
ME2: liveness 
Are satisfied but not 
ME3: ordering 
Bandwidth: continuously 
consumes the bandwidth except 
when a process is inside the CS. 
Exit only requires one message 
Delay: experienced by process is 
zero message(just received token) 
to N messages(just pass the token). 
Throughput: synchronization 
delay between one exit and next 
entry is anywhere from 1(next 
one) to N (self) message 
transmission.
Using Multicast and logical clocks 
 Mutual exclusion between N peer processes based upon multicast. 
 Processes that require entry to a critical section multicast a request 
message, and can enter it only when all the other processes have replied to 
this message. 
 The condition under which a process replies to a request are designed to 
ensure ME1 ME2 and ME3 are met. 
 Each process pi keeps a Lamport clock. Message requesting entry are of 
the formT, pi. 
 Each process records its state of either RELEASE, WANTED or HELD in a 
variable state. 
 If a process requests entry and all other processes is RELEASED, then all 
processes reply immediately. 
 If some process is in state HELD, then that process will not reply until it is 
finished. 
 If some process is in state WANTED and has a smaller timestamp than the 
incoming request, it will queue the request until it is finished. 
 If two or more processes request entry at the same time, then whichever bears 
the lowest timestamp will be the first to collect N-1 replies. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Figure 12.4 
Ricart and Agrawala’s algorithm 
On initialization 
state := RELEASED; 
To enter the section 
state := WANTED; 
Multicast request to all processes; request processing deferred here 
T := request’s timestamp; 
Wait until (number of replies received = (N – 1)); 
state := HELD; 
On receipt of a request Ti, pi at pj (i ≠ j) 
if (state = HELD or (state = WANTED and (T, pj)  (Ti, pi))) 
then 
queue request from pi without replying; 
else 
reply immediately to pi; 
end if 
To exit the critical section 
state := RELEASED; 
reply to any queued requests;
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Figure 12.5 
Multicast synchronization 
p 
3 
34 
Reply 
41 
41 
34 
p 
1 
p 
2 
Reply 
Reply 
 P1 and P2 request CS 
concurrently. The timestamp 
of P1 is 41 and for P2 is 34. 
When P3 receives their 
requests, it replies 
immediately. When P2 
receives P1’s request, it finds 
its own request has the lower 
timestamp, and so does not 
reply, holding P1 request in 
queue. However, P1 will 
reply. P2 will enter CS. After 
P2 finishes, P2 reply P1 and 
P1 will enter CS. 
 Granting entry takes 2(N-1) 
messages, N-1 to multicast 
request and N-1 replies. 
Bandwidth consumption is 
high. 
 Client delay is again 1 round 
trip time 
 Synchronization delay is 
one message transmission 
time.
Maekawa’s voting algorithm 
 It is not necessary for all of its peers to grant 
access. Only need to obtain permission to enter 
from subsets of their peers, as long as the 
subsets used by any two processes overlap. 
 Think of processes as voting for one another to 
enter the CS. A candidate process must collect 
sufficient votes to enter. 
 Processes in the intersection of two sets of 
voters ensure the safety property ME1 by 
casting their votes for only one candidate. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Maekawa’s voting algorithm 
 A voting set Vi associated with each process pi. 
 there is at least one common member of any two voting sets, the size of all 
voting set are the same size to be fair. 
 The optimal solution to minimizes K is K~sqrt(N) and M=K. 
Vi Í{p1, p2 ,..., pN } 
such that for all i, j =1,2,...N : 
pi ÎVi 
Vi ÇVj ¹Æ 
|Vi |= K 
Each process is contained in M of the voting set Vi 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Figure 12.6 
Maekawa’s algorithm – part 1 
On initialization 
state := RELEASED; 
voted := FALSE; 
For pi to enter the critical section 
state := WANTED; 
Multicast request to all processes in Vi; 
Wait until (number of replies received = K); 
state := HELD; 
On receipt of a request from pi at pj 
if (state = HELD or voted = TRUE) 
then 
queue request from pi without replying; 
else 
send reply to pi; 
voted := TRUE; 
end if 
For pi to exit the critical section 
state := RELEASED; 
Multicast release to all processes in Vi; 
On receipt of a release from pi at pj 
if (queue of requests is non-empty) 
then 
remove head of queue – from pk, say; 
send reply to pk; 
voted := TRUE; 
else 
voted := FALSE; 
end if
Maekawa’s algorithm 
 ME1 is met. If two processes can enter CS at the same time, the processes 
in the intersection of two voting sets would have to vote for both. The 
algorithm will only allow a process to make at most one vote between 
successive receipts of a release message. 
 Deadlock prone. For example, p1, p2 and p3 with V1={p1,p2}, V2={p2, p3}, 
V3={p3,p1}. If three processes concurrently request entry to the CS, then it 
is possible for p1 to reply to itself and hold off p2; for p2 rely to itself and 
hold off p3; for p3 to reply to itself and hold off p1. Each process has 
received one out of two replies, and none can proceed. 
 If process queues outstanding request in happen-before order, ME3 can be 
satisfied and will be deadlock free. 
 Bandwidth utilization is 2sqrt(N) messages per entry to CS and sqrt(N) per 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
exit. 
 Client delay is the same as Ricart and Agrawala’s algorithm, one round-trip 
time. 
 Synchronization delay is one round-trip time which is worse than RA.
Fault tolerance 
 What happens when messages are lost? 
 What happens when a process crashes? 
 None of the algorithm that we have described would tolerate the 
loss of messages if the channels were unreliable. 
 The ring-based algorithm cannot tolerate any single process crash failure. 
 Maekawa’s algirithm can tolerate some process crash failures: if a crashed 
process is not in a voting set that is required. 
 The central server algorithm can tolerate the crash failure of a client process that 
neither holds nor has requested the token. 
 The Ricart and Agrawala algorithm as we have described it can be adapted to 
tolerate the crash failure of such a process by taking it to grant all requests 
implicitly. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Elections 
 Algorithm to choose a unique process to play a particular role is called 
an election algorithm. E.g. central server for mutual exclusion, one 
process will be elected as the server. Everybody must agree. If the 
server wishes to retire, then another election is required to choose a 
replacement. 
 Requirements: 
 E1(safety): a participant pi has 
Where P is chosen as the non-crashed process at the end 
of run with the largest identifier. (concurrent elections 
possible.) 
 E2(liveness): All processes Pi participate in election 
process and eventually set 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
electedi = ^ or electedi = P 
i elected ¹ ^ or crash
A ring based election algorithm 
 All processes arranged in a logical ring. 
 Each process has a communication channel to 
the next process. 
 All messages are sent clockwise around the 
ring. 
 Assume that no failures occur, and system is 
asynchronous. 
 Goal is to elect a single process coordinator 
which has the largest identifier. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Figure 12.7 
A ring-based election in progress 
17 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
24 
15 
9 
4 
3 
28 
24 
1 
1. Initially, every process is marked as non-participant. 
Any process can begin an election. 
2. The starting process marks itself as participant 
and place its identifier in a message to its 
neighbour. 
3. A process receives a message and compare it 
with its own. If the arrived identifier is larger, it 
passes on the message. 
4. If arrived identifier is smaller and receiver is not a 
participant, substitute its own identifier in the 
message and forward if. It does not forward the 
message if it is already a participant. 
5. On forwarding of any case, the process marks 
itself as a participant. 
6. If the received identifier is that of the receiver 
itself, then this process’ s identifier must be the 
greatest, and it becomes the coordinator. 
7. The coordinator marks itself as non-participant 
set elected_i and sends an elected message to 
its neighbour enclosing its ID. 
8. When a process receives elected message, 
marks itself as a non-participant, sets its variable 
elected_i and forwards the message.
A ring-based election in progress 
4 
15 
9 
3 
17 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
 Note: The election was 
started by process 17. 
 The highest process 
identifier encountered so far 
is 24. 
 Participant processes are 
shown darkened 28 
24 
24 
1 
 E1 is met. All identifiers are 
compared, since a process 
must receive its own ID back 
before sending an elected 
message. 
 E2 is also met due to the 
guaranteed traversals of the 
ring. 
 Tolerate no failure makes 
ring algorithm of limited 
practical use. 
 If only a single process starts an 
election, the worst-performance case is 
then the anti-clockwise neighbour has 
the highest identifier. A total of N-1 
messages is used to reach this 
neighbour. Then further N messages are 
required to announce its election. The 
elected message is sent N times. Making 
3N-1 messages in all. 
 Turnaround time is also 3N-1 
sequential message transmission time
The bully algorithm 
 Allows process to crash during an election, although it assumes the 
message delivery between processes is reliable. 
 Assume system is synchronous to use timeouts to detect a process 
failure. 
 Assume each process knows which processes have higher 
identifiers and that it can communicate with all such processes. 
 Three types of messages: 
 Election is sent to announce an election message. A 
process begins an election when it notices, through 
timeouts, that the coordinator has failed. 
T=2Ttrans+Tprocess From the time of sending 
 Answer is sent in response to an election message. 
 Coordinator is sent to announce the identity of the 
elected process. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Figure 12.8 
The bully algorithm 
p1 p 
2 
p 
3 
C 
p 
4 
p 
1 
p 
2 
p 
3 
p 
4 
C 
coordinator 
Stage 4 
C 
election 
election 
Stage 2 
p 
1 
p 
2 
p 
3 
p 
4 
election 
answer 
answer 
election 
Stage 1 
timeout 
Stage 3 
Ev entually ..... 
p 
1 
p 
2 
p 
3 p 
4 
election 
answer 
The election of coordinator p2, 
after the failure of p4 and then p3 
1. The process begins a election by sending an 
election message to these processes that have 
a higher ID and awaits an answer in response. If 
none arrives within time T, the process 
considers itself the coordinator and sends 
coordinator message to all processes with lower 
identifiers. Otherwise, it waits a further time T’ 
for coordinator message to arrive. If none, 
begins another election. 
2. If a process receives a coordinator message, 
it sets its variable elected_i to be the coordinator 
ID. 
3. If a process receives an election message, it 
send back an answer message and begins 
another election unless it has begun one 
already. 
E1 may be broken if timeout is not accurate or 
replacement. (suppose P3 crashes and replaced 
by another process. P2 set P3 as coordinator 
and P1 set P2 as coordinator) 
E2 is clearly met by the assumption of reliable 
transmission.
The bully algorithm 
 Best case the process with the second highest 
ID notices the coordinator’s failure. Then it can 
immediately elect itself and send N-2 
coordinator messages. 
 The bully algorithm requires O(N^2) messages 
in the worst case - that is, when the process 
with the least ID first detects the coordinator’s 
failure. For then N-1 processes altogether begin 
election, each sending messages to processes 
with higher ID. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Consensus and Related Problems (agreement) 
 The problem is for processes to agree on a 
value after one or more of the processes has 
proposed what that value should be. (e.g. all 
controlling computers should agree upon 
whether let the spaceship proceed or abort after 
one computer proposes an action. ) 
 Assumptions: Communication is reliable but the 
processes may fail (arbitrary process failure as 
well as crash). Also specify that up to some 
number f of the N processes are faculty. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Consensus problem 
 Every process pi begins in the undecided state and propose a single value vi, 
drawn from a set D. The processes communicate with one another, exchanging 
values. Each process then sets the value of a decision variable di. In doing so it 
enters the decided state, in which it may no longer change di. 
 Requirements: 
 Termination: Eventually each correct process sets its decision variable. 
 Agreement: The decision value of all correct processes is the same: if pi and 
pj are correct and have entered the decided state, then di=dj 
 Integrity: if the correct processes all proposed the same value, then any 
correct process in the decided state has chosen that value. This condition 
can be loosen. For example, not necessarily all of them, may be some of 
them. 
It will be straightforward to solve this problem if no process can fail by 
using multicast and majority vote. 
Termination guaranteed by reliability of multicast, agreement and 
integrity guaranteed by the majority definition and each process has 
the same set of proposed value to evaluate. . 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
d1:=proceed d2:=proceed 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Figure 12.17 
Consensus for three processes 
1 
P2 
P3 (crashes) 
P1 
Consensus algorithm 
v1=proceed 
v3=abort 
v2=proceed
Byzantine general problem ( proposed in1982) 
 Three or more generals are to agree to attack or 
to retreat. One, the commander, issues the 
order. The others, lieutenants to the 
commander, are to decide to attack or retreat. 
 But one or more of the general may be 
treacherous-that is, faulty. If the commander is 
treacherous, he proposes attacking to one 
general and retreating to another. If a lieutenant 
is treacherous, he tells one of his peers that the 
commander told him to attack and another that 
they are to retreat. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Byzantine general problem and Interactive consistency 
 A. Byzantine general problem is different from consensus in that a 
distinguished process supplies a value that the others are to agree upon, instead 
of each of them proposing a value. 
 Requirements: 
 Termination: eventually each correct process sets its decision variable. 
 Agreement: the decision value of all correct processes is the same. 
 Integrity: If the commander is correct, then all correct processes decide on 
the value that the commander proposed. 
If the commander is correct, the integrity implies agreement; but the 
commander need not be correct. 
B. Interactive consistency problem : Another variant of consensus, in which 
every process proposes a single value. Goal of this algorithm is for the 
correct processes to agree on a decision vector of values, one for each 
process. 
Requirements: Termination: eventually each correct process sets it decision 
variable. Agreement: the decision vector of all correct processes is the 
same. Integrity: If pi is correct, then all correct processes decide on vi as 
the ith component of the vector. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Consensus in a synchronous system 
 Basic multicast protocol assuming up to f of the N 
processes exhibit crash failures. 
 Each correct process collects proposed values from the 
other processes. This algorithm proceeds in f+1 rounds, 
in each of which the correct processes Basic-multicast 
the values between themselves. At most f processes 
may crash, by assumption. At worst, all f crashes during 
the round, but the algorithm guarantees that at the end 
of the rounds all the correct processes that have 
survived have the same final set of values are in a 
position to agree. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
Figure 12.18 
Consensus in a synchronous system 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
At most f crashes can occur, and there are 
f+1 rounds. So we can compensate up to f 
crashes. 
Any algorithm to reach consensus despite up 
to f crash failures requires at least f+1 rounds 
of message exchanges, no matter how it is 
constructed.
1:v 1:v 
2:1:v 
1:w 1:x 
2:1:w 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Figure 12.19 
Three byzantine generals 
p1 (Commander) 
p2 p3 
3:1:u 
p1 (Commander) 
p2 p3 
3:1:x 
Faulty processes are shown coloured 
3:1:u: first number indicates source, the second number indicates Who says. From P3, P1 
says u. 
If solution exists, P2 bound to decide on v when commander is correct. If no solution can 
distinguish between correct and faulty commander, p2 must also choose the value sent 
by commander. By Symmetry, P3 should also choose commander, p2 does the same 
thing. But it contradicts with agreement. No solution is N=3f. All because that a correct 
general can not tell which process is faulty. Digital signature can solve this problem.
1:v 1:v 
2:1:v 
1:v 
4:1:v 
1:u 1:w 
2:1:u 
1:v 
4:1:v 
2:1:u 3:1:w 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Figure 12.20 
Four byzantine generals 
p1 (Commander) 
p2 3:1:u 
p3 
Faulty processes are shown coloured 
p4 
4:1:v 
2:1:v 3:1:w 
p1 (Commander) 
p2 3:1:w 
p3 
p4 
4:1:v 
First round: the commander sends a value to each of the lieutenants. 
Second round: each of the lieutenants sends the value it received to its peers. 
A lieutenant receives a value from the commander, plus N-2 values from its peers. 
Lieutenant just applies a simple majority function to the set of values it receives. The faulty 
process may omit to send a value. If timeouts, the receiver just set null as received value.

More Related Content

What's hot

Principles of Compiler Design
Principles of Compiler DesignPrinciples of Compiler Design
Principles of Compiler DesignMarimuthu M
 
Mutual Exclusion Election (Distributed computing)
Mutual Exclusion Election (Distributed computing)Mutual Exclusion Election (Distributed computing)
Mutual Exclusion Election (Distributed computing)Sri Prasanna
 
Distributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock DetectionDistributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock DetectionSHIKHA GAUTAM
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
Process Migration in Heterogeneous Systems
Process Migration in Heterogeneous SystemsProcess Migration in Heterogeneous Systems
Process Migration in Heterogeneous Systemsijsrd.com
 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactionsNilu Desai
 
Distributed Coordination
Distributed CoordinationDistributed Coordination
Distributed Coordinationsiva krishna
 
Chapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlChapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlAbDul ThaYyal
 
2 PHASE COMMIT PROTOCOL
2 PHASE COMMIT PROTOCOL2 PHASE COMMIT PROTOCOL
2 PHASE COMMIT PROTOCOLKABILESH RAMAR
 
Operating System : Ch18 distributed coordination
Operating System : Ch18 distributed coordinationOperating System : Ch18 distributed coordination
Operating System : Ch18 distributed coordinationSyaiful Ahdan
 
Communication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed SystemsCommunication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed Systemsguest61205606
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking VN
 

What's hot (20)

Principles of Compiler Design
Principles of Compiler DesignPrinciples of Compiler Design
Principles of Compiler Design
 
Mutual Exclusion Election (Distributed computing)
Mutual Exclusion Election (Distributed computing)Mutual Exclusion Election (Distributed computing)
Mutual Exclusion Election (Distributed computing)
 
Distributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock DetectionDistributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock Detection
 
Voting protocol
Voting protocolVoting protocol
Voting protocol
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Process Migration in Heterogeneous Systems
Process Migration in Heterogeneous SystemsProcess Migration in Heterogeneous Systems
Process Migration in Heterogeneous Systems
 
OSCh17
OSCh17OSCh17
OSCh17
 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactions
 
Chapter05 new
Chapter05 newChapter05 new
Chapter05 new
 
Distributed Coordination
Distributed CoordinationDistributed Coordination
Distributed Coordination
 
Chapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlChapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency control
 
2 PHASE COMMIT PROTOCOL
2 PHASE COMMIT PROTOCOL2 PHASE COMMIT PROTOCOL
2 PHASE COMMIT PROTOCOL
 
Operating System : Ch18 distributed coordination
Operating System : Ch18 distributed coordinationOperating System : Ch18 distributed coordination
Operating System : Ch18 distributed coordination
 
Agreement protocol
Agreement protocolAgreement protocol
Agreement protocol
 
Communication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed SystemsCommunication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed Systems
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applications
 
13 tm adv
13 tm adv13 tm adv
13 tm adv
 
Distributed System
Distributed SystemDistributed System
Distributed System
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
2 pc
2 pc2 pc
2 pc
 

Viewers also liked

dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algosAkhil Sharma
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replicationAbDul ThaYyal
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems SHATHAN
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsZbigniew Jerzak
 
MITRA: Byzantine Fault Tolerant Middleware for Transaction Processing on Repl...
MITRA: Byzantine Fault Tolerant Middleware for Transaction Processing on Repl...MITRA: Byzantine Fault Tolerant Middleware for Transaction Processing on Repl...
MITRA: Byzantine Fault Tolerant Middleware for Transaction Processing on Repl...Ramaninder Singh Jhajj
 
Replication Management
Replication ManagementReplication Management
Replication ManagementRevolucion
 
Distributed process and scheduling
Distributed process and scheduling Distributed process and scheduling
Distributed process and scheduling SHATHAN
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
 
Distributed Event Routing in Publish/Subscribe Systems
Distributed Event Routing in Publish/Subscribe SystemsDistributed Event Routing in Publish/Subscribe Systems
Distributed Event Routing in Publish/Subscribe SystemsRoberto Baldoni
 
Chapter 15 distributed mm systems
Chapter 15 distributed mm systemsChapter 15 distributed mm systems
Chapter 15 distributed mm systemsAbDul ThaYyal
 
Practical Byzantine Fault Tolerance
Practical Byzantine Fault TolerancePractical Byzantine Fault Tolerance
Practical Byzantine Fault ToleranceSuman Karumuri
 
4.file service architecture
4.file service architecture4.file service architecture
4.file service architectureAbDul ThaYyal
 
Data Replication in Distributed System
Data Replication in  Distributed SystemData Replication in  Distributed System
Data Replication in Distributed SystemEhsan Hessami
 
Lamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusionLamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusionNeelamani Samal
 
CPU Scheduling algorithms
CPU Scheduling algorithmsCPU Scheduling algorithms
CPU Scheduling algorithmsShanu Kumar
 
Replication in Distributed Database
Replication in Distributed DatabaseReplication in Distributed Database
Replication in Distributed DatabaseAbhilasha Lahigude
 

Viewers also liked (20)

Chapter 10
Chapter 10Chapter 10
Chapter 10
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algos
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed Systems
 
MITRA: Byzantine Fault Tolerant Middleware for Transaction Processing on Repl...
MITRA: Byzantine Fault Tolerant Middleware for Transaction Processing on Repl...MITRA: Byzantine Fault Tolerant Middleware for Transaction Processing on Repl...
MITRA: Byzantine Fault Tolerant Middleware for Transaction Processing on Repl...
 
141060753008 3715306
141060753008 3715306141060753008 3715306
141060753008 3715306
 
Replication Management
Replication ManagementReplication Management
Replication Management
 
AFS introduction
AFS introductionAFS introduction
AFS introduction
 
Distributed process and scheduling
Distributed process and scheduling Distributed process and scheduling
Distributed process and scheduling
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Distributed Event Routing in Publish/Subscribe Systems
Distributed Event Routing in Publish/Subscribe SystemsDistributed Event Routing in Publish/Subscribe Systems
Distributed Event Routing in Publish/Subscribe Systems
 
Chapter 15 distributed mm systems
Chapter 15 distributed mm systemsChapter 15 distributed mm systems
Chapter 15 distributed mm systems
 
Practical Byzantine Fault Tolerance
Practical Byzantine Fault TolerancePractical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance
 
4.file service architecture
4.file service architecture4.file service architecture
4.file service architecture
 
Data Replication in Distributed System
Data Replication in  Distributed SystemData Replication in  Distributed System
Data Replication in Distributed System
 
Lamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusionLamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusion
 
CPU Scheduling algorithms
CPU Scheduling algorithmsCPU Scheduling algorithms
CPU Scheduling algorithms
 
Replication in Distributed Database
Replication in Distributed DatabaseReplication in Distributed Database
Replication in Distributed Database
 
Database Replication
Database ReplicationDatabase Replication
Database Replication
 

Similar to Chapter 11c coordination agreement

Chapter 2 system models
Chapter 2 system modelsChapter 2 system models
Chapter 2 system modelsAbDul ThaYyal
 
Lecture 5- Process Synchonization_revised.pdf
Lecture 5- Process Synchonization_revised.pdfLecture 5- Process Synchonization_revised.pdf
Lecture 5- Process Synchonization_revised.pdfAmanuelmergia
 
Mutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's AlgorithmMutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's AlgorithmSouvik Roy
 
Mutual Exclusion in Distributed Memory Systems
Mutual Exclusion in Distributed Memory SystemsMutual Exclusion in Distributed Memory Systems
Mutual Exclusion in Distributed Memory SystemsDilum Bandara
 
Message Passing, Remote Procedure Calls and Distributed Shared Memory as Com...
Message Passing, Remote Procedure Calls and  Distributed Shared Memory as Com...Message Passing, Remote Procedure Calls and  Distributed Shared Memory as Com...
Message Passing, Remote Procedure Calls and Distributed Shared Memory as Com...Sehrish Asif
 
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdfDC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdfLegesseSamuel
 
Coordination and Agreement .ppt
Coordination and Agreement .pptCoordination and Agreement .ppt
Coordination and Agreement .pptSOURAVKUMAR723356
 
9 fault-tolerance
9 fault-tolerance9 fault-tolerance
9 fault-tolerance4020132038
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
Critical Section in Operating System
Critical Section in Operating SystemCritical Section in Operating System
Critical Section in Operating SystemChandnigupta80
 
NLP applied to French legal decisions
NLP applied to French legal decisionsNLP applied to French legal decisions
NLP applied to French legal decisionsMichael BENESTY
 
Planning Payclo (Project management)
Planning Payclo (Project management)Planning Payclo (Project management)
Planning Payclo (Project management)AshuGupta56
 
Ch17 OS
Ch17 OSCh17 OS
Ch17 OSC.U
 
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...CSCJournals
 

Similar to Chapter 11c coordination agreement (20)

Chapter 11b
Chapter 11bChapter 11b
Chapter 11b
 
Cordination&agreement
Cordination&agreementCordination&agreement
Cordination&agreement
 
Chapter 2 system models
Chapter 2 system modelsChapter 2 system models
Chapter 2 system models
 
Basic Paxos Implementation in Orc
Basic Paxos Implementation in OrcBasic Paxos Implementation in Orc
Basic Paxos Implementation in Orc
 
Lecture 5- Process Synchonization_revised.pdf
Lecture 5- Process Synchonization_revised.pdfLecture 5- Process Synchonization_revised.pdf
Lecture 5- Process Synchonization_revised.pdf
 
Mutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's AlgorithmMutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's Algorithm
 
Mutual Exclusion in Distributed Memory Systems
Mutual Exclusion in Distributed Memory SystemsMutual Exclusion in Distributed Memory Systems
Mutual Exclusion in Distributed Memory Systems
 
Message Passing, Remote Procedure Calls and Distributed Shared Memory as Com...
Message Passing, Remote Procedure Calls and  Distributed Shared Memory as Com...Message Passing, Remote Procedure Calls and  Distributed Shared Memory as Com...
Message Passing, Remote Procedure Calls and Distributed Shared Memory as Com...
 
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdfDC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
 
Coordination and Agreement .ppt
Coordination and Agreement .pptCoordination and Agreement .ppt
Coordination and Agreement .ppt
 
dss
dssdss
dss
 
9 fault-tolerance
9 fault-tolerance9 fault-tolerance
9 fault-tolerance
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Critical Section in Operating System
Critical Section in Operating SystemCritical Section in Operating System
Critical Section in Operating System
 
NLP applied to French legal decisions
NLP applied to French legal decisionsNLP applied to French legal decisions
NLP applied to French legal decisions
 
Planning Payclo (Project management)
Planning Payclo (Project management)Planning Payclo (Project management)
Planning Payclo (Project management)
 
CS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMSCS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMS
 
Ch17 OS
Ch17 OSCh17 OS
Ch17 OS
 
OS_Ch17
OS_Ch17OS_Ch17
OS_Ch17
 
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
 

More from AbDul ThaYyal

Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systemsAbDul ThaYyal
 
Chapter 4 a interprocess communication
Chapter 4 a interprocess communicationChapter 4 a interprocess communication
Chapter 4 a interprocess communicationAbDul ThaYyal
 
Chapter 3 networking and internetworking
Chapter 3 networking and internetworkingChapter 3 networking and internetworking
Chapter 3 networking and internetworkingAbDul ThaYyal
 
Chapter 1 characterisation of distributed systems
Chapter 1 characterisation of distributed systemsChapter 1 characterisation of distributed systems
Chapter 1 characterisation of distributed systemsAbDul ThaYyal
 
4.file service architecture (1)
4.file service architecture (1)4.file service architecture (1)
4.file service architecture (1)AbDul ThaYyal
 
4. concurrency control
4. concurrency control4. concurrency control
4. concurrency controlAbDul ThaYyal
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirementsAbDul ThaYyal
 

More from AbDul ThaYyal (16)

Chapter 17 corba
Chapter 17 corbaChapter 17 corba
Chapter 17 corba
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
Chapter 9 names
Chapter 9 namesChapter 9 names
Chapter 9 names
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systems
 
Chapter 7 security
Chapter 7 securityChapter 7 security
Chapter 7 security
 
Chapter 6 os
Chapter 6 osChapter 6 os
Chapter 6 os
 
Chapter 5
Chapter 5Chapter 5
Chapter 5
 
Chapter 4 a interprocess communication
Chapter 4 a interprocess communicationChapter 4 a interprocess communication
Chapter 4 a interprocess communication
 
Chapter 3 networking and internetworking
Chapter 3 networking and internetworkingChapter 3 networking and internetworking
Chapter 3 networking and internetworking
 
Chapter 1 characterisation of distributed systems
Chapter 1 characterisation of distributed systemsChapter 1 characterisation of distributed systems
Chapter 1 characterisation of distributed systems
 
4.file service architecture (1)
4.file service architecture (1)4.file service architecture (1)
4.file service architecture (1)
 
4. system models
4. system models4. system models
4. system models
 
4. concurrency control
4. concurrency control4. concurrency control
4. concurrency control
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirements
 
3. challenges
3. challenges3. challenges
3. challenges
 
2. microkernel new
2. microkernel new2. microkernel new
2. microkernel new
 

Chapter 11c coordination agreement

  • 1. Slides for Chapter 8: Coordination and Agreement From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Pearson Education 2005
  • 2. Topics A set of processes coordinate their actions. How to agree on one or more values Avoid fixed master-salve relationship to avoid single points of failure for fixed master. Distributed mutual exclusion for resource sharing A collection of process share resources, mutual exclusion is needed to prevent interference and ensure consistency. ( critical section) No shared variables or facilities are provided by single local kernel to solve it. Require a solution that is based solely on message passing. Important factor to consider while designing algorithm is the failure Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 3. Distributed Mutual Exclusion Application level protocol for executing a critical section enter() // enter critical section-block if necessary resrouceAccess() //access shared resoruces exit() //leave critical section-other processes may enter Essential requirements: ME1: (safety) at most one process may execute in the critical section ME2: (liveness) Request to enter and exit the critical section eventually succeed. ME3(ordering) One request to enter the CS happened-before another, then entry to the CS is granted in that order. ME2 implies freedom from both deadlock and starvation. Starvation involves fairness condition. The order in which processes enter the critical section. It is not possible to use the request time to order them due to lack of global clock. So usually, we use happen-before ordering to order message requests. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 4. Performance Evaluation Bandwidth consumption, which is proportional to the number of messages sent in each entry and exit operations. The client delay incurred by a process at each entry and exit operation. throughput of the system. Rate at which the collection of processes as a whole can access the critical section. Measure the effect using the synchronization delay between one process exiting the critical section and the next process entering it; the shorter the delay is, the greater the throughput is. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 5. Central Server Algorithm The simplest way to grant permission to enter the critical section is to employ a server. A process sends a request message to server and awaits a reply from it. If a reply constitutes a token signifying the permission to enter the critical section. If no other process has the token at the time of the request, then the server replied immediately with the token. If token is currently held by other processes, the server does not reply but queues the request. Client on exiting the critical section, a message is sent to server, giving it back the token. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 6. Figure 12.2 Central Server algorithm: managing a mutual exclusion token for a set of processes 3. Grant token Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Server Queue of requests 1. Request token 2. Release token 4 2 p4 p p 3 2 p 1 ME1: safety ME2: liveness Are satisfied but not ME3: ordering Bandwidth: entering takes two messages( request followed by a grant), delayed by the round-trip time; exiting takes one release message, and does not delay the exiting process. Throughput is measured by synchronization delay, the round-trip of a release message and grant message.
  • 7. Ring-based Algorithm Simplest way to arrange mutual exclusion between N processes without requiring an additional process is arrange them in a logical ring. Each process pi has a communication channel to the next process in the ring, p(i+1)/mod N. The unique token is in the form of a message passed from process to process in a single direction clockwise. If a process does not require to enter the CS when it receives the token, then it immediately forwards the token to its neighbor. A process requires the token waits until it receives it, but retains it. To exit the critical section, the process sends the token on to its neighbor. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 8. Figure 12.3 A ring of processes transferring a mutual exclusion token Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 pn p 2 p 3 p 4 Token p 1 ME1: safety ME2: liveness Are satisfied but not ME3: ordering Bandwidth: continuously consumes the bandwidth except when a process is inside the CS. Exit only requires one message Delay: experienced by process is zero message(just received token) to N messages(just pass the token). Throughput: synchronization delay between one exit and next entry is anywhere from 1(next one) to N (self) message transmission.
  • 9. Using Multicast and logical clocks Mutual exclusion between N peer processes based upon multicast. Processes that require entry to a critical section multicast a request message, and can enter it only when all the other processes have replied to this message. The condition under which a process replies to a request are designed to ensure ME1 ME2 and ME3 are met. Each process pi keeps a Lamport clock. Message requesting entry are of the formT, pi. Each process records its state of either RELEASE, WANTED or HELD in a variable state. If a process requests entry and all other processes is RELEASED, then all processes reply immediately. If some process is in state HELD, then that process will not reply until it is finished. If some process is in state WANTED and has a smaller timestamp than the incoming request, it will queue the request until it is finished. If two or more processes request entry at the same time, then whichever bears the lowest timestamp will be the first to collect N-1 replies. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 10. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Figure 12.4 Ricart and Agrawala’s algorithm On initialization state := RELEASED; To enter the section state := WANTED; Multicast request to all processes; request processing deferred here T := request’s timestamp; Wait until (number of replies received = (N – 1)); state := HELD; On receipt of a request Ti, pi at pj (i ≠ j) if (state = HELD or (state = WANTED and (T, pj) (Ti, pi))) then queue request from pi without replying; else reply immediately to pi; end if To exit the critical section state := RELEASED; reply to any queued requests;
  • 11. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Figure 12.5 Multicast synchronization p 3 34 Reply 41 41 34 p 1 p 2 Reply Reply P1 and P2 request CS concurrently. The timestamp of P1 is 41 and for P2 is 34. When P3 receives their requests, it replies immediately. When P2 receives P1’s request, it finds its own request has the lower timestamp, and so does not reply, holding P1 request in queue. However, P1 will reply. P2 will enter CS. After P2 finishes, P2 reply P1 and P1 will enter CS. Granting entry takes 2(N-1) messages, N-1 to multicast request and N-1 replies. Bandwidth consumption is high. Client delay is again 1 round trip time Synchronization delay is one message transmission time.
  • 12. Maekawa’s voting algorithm It is not necessary for all of its peers to grant access. Only need to obtain permission to enter from subsets of their peers, as long as the subsets used by any two processes overlap. Think of processes as voting for one another to enter the CS. A candidate process must collect sufficient votes to enter. Processes in the intersection of two sets of voters ensure the safety property ME1 by casting their votes for only one candidate. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 13. Maekawa’s voting algorithm A voting set Vi associated with each process pi. there is at least one common member of any two voting sets, the size of all voting set are the same size to be fair. The optimal solution to minimizes K is K~sqrt(N) and M=K. Vi Í{p1, p2 ,..., pN } such that for all i, j =1,2,...N : pi ÎVi Vi ÇVj ¹Æ |Vi |= K Each process is contained in M of the voting set Vi Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 14. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Figure 12.6 Maekawa’s algorithm – part 1 On initialization state := RELEASED; voted := FALSE; For pi to enter the critical section state := WANTED; Multicast request to all processes in Vi; Wait until (number of replies received = K); state := HELD; On receipt of a request from pi at pj if (state = HELD or voted = TRUE) then queue request from pi without replying; else send reply to pi; voted := TRUE; end if For pi to exit the critical section state := RELEASED; Multicast release to all processes in Vi; On receipt of a release from pi at pj if (queue of requests is non-empty) then remove head of queue – from pk, say; send reply to pk; voted := TRUE; else voted := FALSE; end if
  • 15. Maekawa’s algorithm ME1 is met. If two processes can enter CS at the same time, the processes in the intersection of two voting sets would have to vote for both. The algorithm will only allow a process to make at most one vote between successive receipts of a release message. Deadlock prone. For example, p1, p2 and p3 with V1={p1,p2}, V2={p2, p3}, V3={p3,p1}. If three processes concurrently request entry to the CS, then it is possible for p1 to reply to itself and hold off p2; for p2 rely to itself and hold off p3; for p3 to reply to itself and hold off p1. Each process has received one out of two replies, and none can proceed. If process queues outstanding request in happen-before order, ME3 can be satisfied and will be deadlock free. Bandwidth utilization is 2sqrt(N) messages per entry to CS and sqrt(N) per Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 exit. Client delay is the same as Ricart and Agrawala’s algorithm, one round-trip time. Synchronization delay is one round-trip time which is worse than RA.
  • 16. Fault tolerance What happens when messages are lost? What happens when a process crashes? None of the algorithm that we have described would tolerate the loss of messages if the channels were unreliable. The ring-based algorithm cannot tolerate any single process crash failure. Maekawa’s algirithm can tolerate some process crash failures: if a crashed process is not in a voting set that is required. The central server algorithm can tolerate the crash failure of a client process that neither holds nor has requested the token. The Ricart and Agrawala algorithm as we have described it can be adapted to tolerate the crash failure of such a process by taking it to grant all requests implicitly. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 17. Elections Algorithm to choose a unique process to play a particular role is called an election algorithm. E.g. central server for mutual exclusion, one process will be elected as the server. Everybody must agree. If the server wishes to retire, then another election is required to choose a replacement. Requirements: E1(safety): a participant pi has Where P is chosen as the non-crashed process at the end of run with the largest identifier. (concurrent elections possible.) E2(liveness): All processes Pi participate in election process and eventually set Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 electedi = ^ or electedi = P i elected ¹ ^ or crash
  • 18. A ring based election algorithm All processes arranged in a logical ring. Each process has a communication channel to the next process. All messages are sent clockwise around the ring. Assume that no failures occur, and system is asynchronous. Goal is to elect a single process coordinator which has the largest identifier. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 19. Figure 12.7 A ring-based election in progress 17 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 24 15 9 4 3 28 24 1 1. Initially, every process is marked as non-participant. Any process can begin an election. 2. The starting process marks itself as participant and place its identifier in a message to its neighbour. 3. A process receives a message and compare it with its own. If the arrived identifier is larger, it passes on the message. 4. If arrived identifier is smaller and receiver is not a participant, substitute its own identifier in the message and forward if. It does not forward the message if it is already a participant. 5. On forwarding of any case, the process marks itself as a participant. 6. If the received identifier is that of the receiver itself, then this process’ s identifier must be the greatest, and it becomes the coordinator. 7. The coordinator marks itself as non-participant set elected_i and sends an elected message to its neighbour enclosing its ID. 8. When a process receives elected message, marks itself as a non-participant, sets its variable elected_i and forwards the message.
  • 20. A ring-based election in progress 4 15 9 3 17 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Note: The election was started by process 17. The highest process identifier encountered so far is 24. Participant processes are shown darkened 28 24 24 1 E1 is met. All identifiers are compared, since a process must receive its own ID back before sending an elected message. E2 is also met due to the guaranteed traversals of the ring. Tolerate no failure makes ring algorithm of limited practical use. If only a single process starts an election, the worst-performance case is then the anti-clockwise neighbour has the highest identifier. A total of N-1 messages is used to reach this neighbour. Then further N messages are required to announce its election. The elected message is sent N times. Making 3N-1 messages in all. Turnaround time is also 3N-1 sequential message transmission time
  • 21. The bully algorithm Allows process to crash during an election, although it assumes the message delivery between processes is reliable. Assume system is synchronous to use timeouts to detect a process failure. Assume each process knows which processes have higher identifiers and that it can communicate with all such processes. Three types of messages: Election is sent to announce an election message. A process begins an election when it notices, through timeouts, that the coordinator has failed. T=2Ttrans+Tprocess From the time of sending Answer is sent in response to an election message. Coordinator is sent to announce the identity of the elected process. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 22. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Figure 12.8 The bully algorithm p1 p 2 p 3 C p 4 p 1 p 2 p 3 p 4 C coordinator Stage 4 C election election Stage 2 p 1 p 2 p 3 p 4 election answer answer election Stage 1 timeout Stage 3 Ev entually ..... p 1 p 2 p 3 p 4 election answer The election of coordinator p2, after the failure of p4 and then p3 1. The process begins a election by sending an election message to these processes that have a higher ID and awaits an answer in response. If none arrives within time T, the process considers itself the coordinator and sends coordinator message to all processes with lower identifiers. Otherwise, it waits a further time T’ for coordinator message to arrive. If none, begins another election. 2. If a process receives a coordinator message, it sets its variable elected_i to be the coordinator ID. 3. If a process receives an election message, it send back an answer message and begins another election unless it has begun one already. E1 may be broken if timeout is not accurate or replacement. (suppose P3 crashes and replaced by another process. P2 set P3 as coordinator and P1 set P2 as coordinator) E2 is clearly met by the assumption of reliable transmission.
  • 23. The bully algorithm Best case the process with the second highest ID notices the coordinator’s failure. Then it can immediately elect itself and send N-2 coordinator messages. The bully algorithm requires O(N^2) messages in the worst case - that is, when the process with the least ID first detects the coordinator’s failure. For then N-1 processes altogether begin election, each sending messages to processes with higher ID. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 24. Consensus and Related Problems (agreement) The problem is for processes to agree on a value after one or more of the processes has proposed what that value should be. (e.g. all controlling computers should agree upon whether let the spaceship proceed or abort after one computer proposes an action. ) Assumptions: Communication is reliable but the processes may fail (arbitrary process failure as well as crash). Also specify that up to some number f of the N processes are faculty. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 25. Consensus problem Every process pi begins in the undecided state and propose a single value vi, drawn from a set D. The processes communicate with one another, exchanging values. Each process then sets the value of a decision variable di. In doing so it enters the decided state, in which it may no longer change di. Requirements: Termination: Eventually each correct process sets its decision variable. Agreement: The decision value of all correct processes is the same: if pi and pj are correct and have entered the decided state, then di=dj Integrity: if the correct processes all proposed the same value, then any correct process in the decided state has chosen that value. This condition can be loosen. For example, not necessarily all of them, may be some of them. It will be straightforward to solve this problem if no process can fail by using multicast and majority vote. Termination guaranteed by reliability of multicast, agreement and integrity guaranteed by the majority definition and each process has the same set of proposed value to evaluate. . Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 26. d1:=proceed d2:=proceed Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Figure 12.17 Consensus for three processes 1 P2 P3 (crashes) P1 Consensus algorithm v1=proceed v3=abort v2=proceed
  • 27. Byzantine general problem ( proposed in1982) Three or more generals are to agree to attack or to retreat. One, the commander, issues the order. The others, lieutenants to the commander, are to decide to attack or retreat. But one or more of the general may be treacherous-that is, faulty. If the commander is treacherous, he proposes attacking to one general and retreating to another. If a lieutenant is treacherous, he tells one of his peers that the commander told him to attack and another that they are to retreat. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 28. Byzantine general problem and Interactive consistency A. Byzantine general problem is different from consensus in that a distinguished process supplies a value that the others are to agree upon, instead of each of them proposing a value. Requirements: Termination: eventually each correct process sets its decision variable. Agreement: the decision value of all correct processes is the same. Integrity: If the commander is correct, then all correct processes decide on the value that the commander proposed. If the commander is correct, the integrity implies agreement; but the commander need not be correct. B. Interactive consistency problem : Another variant of consensus, in which every process proposes a single value. Goal of this algorithm is for the correct processes to agree on a decision vector of values, one for each process. Requirements: Termination: eventually each correct process sets it decision variable. Agreement: the decision vector of all correct processes is the same. Integrity: If pi is correct, then all correct processes decide on vi as the ith component of the vector. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 29. Consensus in a synchronous system Basic multicast protocol assuming up to f of the N processes exhibit crash failures. Each correct process collects proposed values from the other processes. This algorithm proceeds in f+1 rounds, in each of which the correct processes Basic-multicast the values between themselves. At most f processes may crash, by assumption. At worst, all f crashes during the round, but the algorithm guarantees that at the end of the rounds all the correct processes that have survived have the same final set of values are in a position to agree. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 30. Figure 12.18 Consensus in a synchronous system Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 At most f crashes can occur, and there are f+1 rounds. So we can compensate up to f crashes. Any algorithm to reach consensus despite up to f crash failures requires at least f+1 rounds of message exchanges, no matter how it is constructed.
  • 31. 1:v 1:v 2:1:v 1:w 1:x 2:1:w Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Figure 12.19 Three byzantine generals p1 (Commander) p2 p3 3:1:u p1 (Commander) p2 p3 3:1:x Faulty processes are shown coloured 3:1:u: first number indicates source, the second number indicates Who says. From P3, P1 says u. If solution exists, P2 bound to decide on v when commander is correct. If no solution can distinguish between correct and faulty commander, p2 must also choose the value sent by commander. By Symmetry, P3 should also choose commander, p2 does the same thing. But it contradicts with agreement. No solution is N=3f. All because that a correct general can not tell which process is faulty. Digital signature can solve this problem.
  • 32. 1:v 1:v 2:1:v 1:v 4:1:v 1:u 1:w 2:1:u 1:v 4:1:v 2:1:u 3:1:w Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Figure 12.20 Four byzantine generals p1 (Commander) p2 3:1:u p3 Faulty processes are shown coloured p4 4:1:v 2:1:v 3:1:w p1 (Commander) p2 3:1:w p3 p4 4:1:v First round: the commander sends a value to each of the lieutenants. Second round: each of the lieutenants sends the value it received to its peers. A lieutenant receives a value from the commander, plus N-2 values from its peers. Lieutenant just applies a simple majority function to the set of values it receives. The faulty process may omit to send a value. If timeouts, the receiver just set null as received value.