SlideShare a Scribd company logo
1 of 37
Download to read offline
Totem protocol
Bin Liu
bliu@suse.com
Guideline
Introduction
Totem Single Ring Protocol
Totem Redundant Ring Protocol
Introduction
 SRP: The Totem Single-Ring Ordering and Membership
Protocol
 Supports high-performance fault-tolerance distributed systems that
continue to operate despite network partitioning and remerging, and
processors fail and restart.
 Provides totally ordered message delivery with low overhead, high
throughput and low latency using a logical token-passing ring.
 Provides rapid detection of network partitioning and processor failure
together with reconfiguration and membership services
 RRP: The Totem Redundant Ring Protocol
 Based on SRP(You can think this layer modified recv/send of SRP)
 Make it more reliable even a node if offline by configuring an extra
network interface
Introduction
 Processor:
A corosync node which is member in the CPG(Closed Process Group)
 Application:
Programs that uses corosync to communicate, for example pacemaker,
dlm, sheepdog
Introduction
P2
P3
P4
P1
A4
A1 A3
A2
Pn: Processor
An: Application
Introduction
 Broadcast:
One Processor => all Processors
 Transmit/Forward token:
One Processor => next Processor
 Delivery:
One Processor => associated Application
Reliable Ordered Delivery Services
Reliable Delivery for Configuration C
each message has unique identifier
if processor p delivers message m, p delivers m once only. If p delivers
two different messages, the p delivers 1 of those messages strictly
before delivers the other.
if p originates message m, then p will deliver m or will fail before
delivering a Configuration Change message to install a new regular
configuration
if p is a member of regular configuration C, and no configuration
change occurs, then p will deliver in C all the messages originated in C
if p delivers message m originated in C, then p is a member of C
if p and q are both members of configurations C1 and C2 then p and q
deliver the same set of messages in C1 before delivering a
Configuration Change message that terminates C1 and starts C2.
Reliable Ordered Delivery Services
 Delivery in Causal Order
delivery order should respect Lamport causality within a configuration
 Delivery in Agreed Order
guarantees that processors deliver messages in a consistent total order.
When a processor delivers a message, it has delivered all preceding
messages in the same total order
 Delivery in Safe Order
When processor delivers a message, it has determined that every
processor in the current configuration has received the message and
will deliver that message unless that processor fails.
Totem Single Ring Protocol
Totem Single Ring Protocol
 The Totem Ordering Protocol (OP)
 The Membership Protocol (MP)
 The Recovery Protocol (RP)
The Totem Ordering Protocol
Under Operational state
To ensure messages are delivered to Application in Agreed Order or
Safe Order
The Application can specify in Agreed Order or Safe Order
A processor uses token to deliver messages on total order one by one
An Example
P2
P3
P4
P1
M3M2M1
 A1 asks P1 to deliver 3 pieces of message: M1, M2, M3(in P1’ s request queue)
 Suppose that P1 has got the token, it will transmit:M1, M2, M3
 Of course P1 will save in its receive queue
A1
M3M2M1
An Example
P2
P3
P4
P1
 P2 only received M2M1, while P3 and P4 received M3M2M1
 P1 transmits Token to P2,in the Token, seq indicates the max seq
is 3 in P1 ‘s receive queue
 P2 will compare seq with its messages, and find M3 got lost。
A1
M3M2M1
Recv: M2M1
Recv: M3M2M1
Recv: M3M2M1
Token
seq:3
aru:3
aru_id:P1
rtr:
Recv: M3M2M1
An Example
P2
P3
P4
P1
 P2 updates aru(all-received-up-to) to 2 in the token, and set rtr to 2
 Then transmits the token to P3
 On receiving the token, P3 will broadcast M3 to the cluster
 After clearing rtr, P3 will transmit the token to P4
A1
M3M2M1
Recv: M2M1
Recv: M3M2M1
Recv: M3M2M1
Token
seq:3
aru:2
aru_id:P2
rtr:3
Recv: M3M2M1
M3
An Example
P2
P3
P4
P1
 P2 received message M3 broadcast by P3, others will ignore M3
 P4 got the token transmit by P3, nothing to do, and transmits the
token to P1
A1
M3M2M1
Recv: M3M2M1
Recv: M3M2M1
Recv: M3M2M1
Token
seq:3
aru:2
aru_id:P2
rtr:
Recv: M3M2M1
An Example
P2
P3
P4
P1
 P1 received the token transmit by P4, nothing to do, and transmits
it to P2
A1
M3M2M1
Recv: M3M2M1
Recv: M3M2M1
Recv: M3M2M1
Token
seq:3
aru:2
aru_id:P2
rtr:
Recv: M3M2M1
An Example
P2
P3
P4
P1
 P1 received the token transmit by P4, nothing to do, and transmits it to P2
 P2 finds the aru_id in token is itself, and it already got M3
 Then updates aru to 3, and P2 knows that all node has got M3M2M1
 P2 transmits the token to P3
A1
M3M2M1
Recv: M3M2M1
Recv: M3M2M1
Recv: M3M2M1
Token
seq:3
aru:2
aru_id:P2
rtr:
Recv: M3M2M1
If P2 delivers M3M2M1 to
application,it is in Safe
Order
In Agreed/Safe Order?
 Agreed Order
If the processor got the token delivers messages to the application in
order, then the messages are in Agreed Order。
 Safe Order
If aru in token are greater than a seq in two successive transmits, then
the massages are in Safe Order.
The Membership Protocol
Under the Gather state and Commit state
When a new processor joins the cluster or an old processor leaves the
cluster, it will form a new Single-Ring
An Example: new node join
P2
P3
P4
P1
P4 is a new node that joins the cluster
The old ring is {P1,P2,P3}, and its seq is 100. For nodes in old ring,
my_proc_set stores the member list
When P4 joins cluster, it will broadcast a join msg。
Upon P1,P2,P3 receiving the join msg, they enter Gather state
sender_id:P4
proc_set: P4
fail_set:
ring_seq:xx
my_proc_set:P1P2P3
my_proc_set:P1P2P3
my_proc_set:P1P2P3
my_proc_set:P4
An Example: new node join
P2
P3
P4
P1
When P1,P2,P3 received JoinMsg from P4, they merge the proc_set
from JoinMsg into their own my_proc_set
P1,P2,P3 will broadcast a new JoinMsg
Upon receiving JoinMsg from other nodes, every node will compare
proc_set in JoinMsg and my_proc_set, and mark consensus if they are the same
sender_id:P1
proc_set: P[1-4]
fail_set:
ring_seq:xmy_proc_set:P1P2P3P4
my_proc_set:P1P2P3P4
my_proc_set:P1P2P3P4
my_proc_set:P4
sender_id:P2
proc_set: P[1-4]
fail_set:
ring_seq:x
sender_id:P3
proc_set: P[1-4]
fail_set:
ring_seq:x
An Example: new node join
P2
P3
P4
P1
When a node find all members in its my_proc_set reached consensus,if the
node has minimum id, it will send Commit Token and enter commit state,
CommitToken’s ring_id.seq = max(old ring_id and JoinMsg’s ring_id) + 4
Based on the above slide, after serveral times, we suppose P1,P3,P4 reached
consensus
P2 did not receive message from P3, in P2’s consensus list, consensus[P3]=false。
my_proc_set:P1P2P3P4
consensus[All]=true
my_proc_set:P1P2P3P4
consensus[P3]=false
consensu[P1,2,4]=true
my_proc_set:P1P2P3P4
Consensus[All]=true
my_proc_set:P1P2P3P4
consensus[All]=true
Commit Token
ring_id: 104/p1
memb_list:{P1}
memb_idx:P1
P1 has the minimum
id,and transmits
commit token;but
token is discard by
P2,which trigers
token loss,and re-
send JoinMsg
P2 my_proc_set did not
reach consensus,discard
commit token,will triger
consensus timeout and re-
send JoinMsg
memb: {
P1,
old ring_id,
old my_aru,
high_delivered,
received_flg
}
An Example: new node join
P2
P3
P4
P1
The normal situation
After serveral times of receiving and sending JoinMsg, all Processors’
my_proc_set are marked as consensus。
P2 received Commit Token from P1, updates memb_list and memb_idx,
then transmits the Commit Token,and enters Commit state
Commit Token
ring_id: 104/p1
memb_list:{P1,P2}
memb_idx:P2
An Example: new node join
P2
P3
P4
P1
P3 received Commit Token from P2, updates memb_list and memb_idx,
then transmits the Commit Token,and enters Commit state
Commit Token
ring_id: 104/P1
memb_list:{P1,P2,P3}
memb_idx:P3
An Example: new node join
P2
P3
P4
P1
P4 received Commit Token from P3, updates memb_list and memb_idx,
then transmits the Commit Token,and enters Commit state
Commit Token
ring_id: 104/P1
memb_list:{P1,P2,P3,P4}
memb_idx:P4
An Example: new node join
P2
P3
P4
P1
P1 received the Commit Token from P4,as P1 is in Commit state,
P1 knows that all members are in Commit state
P1 transmits the Commit Token again, and enters Recovery state,
and set the ring_id (my_ring_id=CommitToken’s ring_id)
state: Recovery
my_ring_id: 104/P1
my_new_memb: {P1,P2,P3,P4}
my_trans_memb: {P1,P2,P3}
…
Commit Token
ring_id: 104/P1
memb_list:{P1,P2,P3,P4}
memb_idx:P1
state: commit
my_ring_id: 100/P1
my_new_memb: {}
my_trans_memb: {}
…
state: commit
my_ring_id: 100/P1
my_new_memb: {}
my_trans_memb: {}
…
state: commit
my_ring_id: 100/P1
my_new_memb: {}
my_trans_memb: {}
…
An Example: new node join
P2 transmits the Commit Token again, and enters Recovery state,
and set the ring_id (my_ring_id=CommitToken’s ring_id)
state: Recovery
my_ring_id: 104/P1
my_new_memb: {P1,P2,P3,P4}
my_trans_memb: {P1,P2,P3}
…
P2
P3
P4
P1
state: Recovery
my_ring_id: 104/P1
my_new_memb: {P1,P2,P3,P4}
my_trans_memb: {P1,P2,P3}
…
Commit Token
ring_id: 104/P1
memb_list:{P1,P2,P3,P4}
memb_idx:P2
state: commit
my_ring_id: 100/P1
my_new_memb: {}
my_trans_memb: {}
…
state: commit
my_ring_id: 100/P1
my_new_memb: {}
my_trans_memb: {}
…
An Example: new node join
P3 transmits the Commit Token again, and enters Recovery state,
and set the ring_id (my_ring_id=CommitToken’s ring_id)
P2
P3
P4
P1
state: Recovery
my_ring_id: 104/P1
my_new_memb: {P1,P2,P3,P4}
my_trans_memb: {P1,P2,P3}
…
Commit Token
ring_id: 104/P1
memb_list:{P1,P2,P3,P4}
memb_idx:P3
state: Commit
my_ring_id: 100/P1
my_new_memb: {}
my_trans_memb: {}
…
state: Recovery
my_ring_id: 104/P1
my_new_memb: {P1,P2,P3,P4}
my_trans_memb: {P1,P2,P3}
…
state: Recovery
my_ring_id: 104/P1
my_new_memb: {P1,P2,P3,P4}
my_trans_memb: {P1,P2,P3}
…
An Example: new node join
P4 transmits the Commit Token again, and enters Recovery state,
and set the ring_id (my_ring_id=CommitToken’s ring_id)
As P4 is a new member, there is only itself in its my_trans_memb
When P1 received the Commit Token the 3rd time, every node reached
Reovery state
state: Recovery
my_ring_id: 104/P1
my_new_memb: {P1,P2,P3,P4}
my_trans_memb: {P4}
…
P2
P3
P4
P1state: Recovery
my_ring_id: 104/P1
my_new_memb: {P1,P2,P3,P4}
my_trans_memb: {P1,P2,P3}
…
Commit Token
ring_id: 104/P1
memb_list:{P1,P2,P3,P4}
memb_idx:P4
state: Recovery
my_ring_id: 104/P1
my_new_memb: {P1,P2,P3,P4}
my_trans_memb: {P1,P2,P3}
…
state: Recovery
my_ring_id: 104/P1
my_new_memb: {P1,P2,P3,P4}
my_trans_memb: {P1,P2,P3}
…
The Recovery Protocol
Under Recovery state
Transition from Old Ring to New Ring,recovery message from Old
Ring to make the messages in Agreed order or Safe Order
In Rcovery state, messages delivered by Application to the New Ring
can not be broadcast(must be in Operational state)
The Recovery Protocol
 Step1:
Exchange messages with other processors those are from the same
Old Ring(similar to Operational State)
Notice: there might be multiple Old Rings in one New Ring
 Step2:
Deliver the messages those in Agreed/Safe Order to the Application in
the Old configuration(message.seq<=high_ring_delivered)
The Recovery Protocol
 Step3:
Deliver the 1st ConfingChange Msg(Transitional Configuration) to the
Application
The 1st ConfingChange Msg contains member list of the Old Ring that
belong to the New Ring.
 Step4:
Deliver messages(in Transitional Configuration) those are in
Agreed/Safe Order to the Application
The Recovery Protocol
 Step5:
Deliver the 2nd ConfingChange Msg(New Configuration) to the
Application
The 2nd ConfingChange Msg contains member list of the New Ring
 Step6:
Enter Operational State from Recovery State
Step2-Step6 does not need to exchange messages with other
Processors , it is an atomic operation
Totem Redundant Ring Protocol
 Based on SRP(You can think this layer modified recv/send of SRP)
 Make it more reliable even a node if offline by configuring an extra
network interface
Totem Redundant Ring Protocol
 Active replication
All messages are transmit by N channels
Every message is received N times
The more channel(larger N), the higher bandwidth cost for a Processor
 Passive replication
Every message is transmit by 1 of the N channels
Every message is received N times
The bandwidth is the same with Single-Ring for a Processor
 Active-passive replication
The mixture of Active and passive, all messages are transmit by K
channels(1<K<N)
Q&A
Thanks

More Related Content

What's hot

Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsVipin Varghese
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and consFabio Fumarola
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology義洋 顏
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpJames Denton
 
Implementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetImplementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetJames Wernicke
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
mpeg2ts1_es_pes_ps_ts_psi
mpeg2ts1_es_pes_ps_ts_psimpeg2ts1_es_pes_ps_ts_psi
mpeg2ts1_es_pes_ps_ts_psihexiay
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux NetworkingPLUMgrid
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel SourceMotaz Saad
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
Network-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQNetwork-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQICS
 

What's hot (20)

Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
 
Qemu Introduction
Qemu IntroductionQemu Introduction
Qemu Introduction
 
Implementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over EthernetImplementation &amp; Comparison Of Rdma Over Ethernet
Implementation &amp; Comparison Of Rdma Over Ethernet
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
mpeg2ts1_es_pes_ps_ts_psi
mpeg2ts1_es_pes_ps_ts_psimpeg2ts1_es_pes_ps_ts_psi
mpeg2ts1_es_pes_ps_ts_psi
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel Source
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Network-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQNetwork-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQ
 

Viewers also liked

Viewers also liked (20)

PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PostgreSQL Write-Ahead Log (Heikki Linnakangas) PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
 
Totem y tabu
Totem y  tabuTotem y  tabu
Totem y tabu
 
Totem协议(SRP/RRP)讲解
Totem协议(SRP/RRP)讲解Totem协议(SRP/RRP)讲解
Totem协议(SRP/RRP)讲解
 
tabú y totem
tabú y totemtabú y totem
tabú y totem
 
PostgreSQL 資料可靠性及WAL
PostgreSQL 資料可靠性及WALPostgreSQL 資料可靠性及WAL
PostgreSQL 資料可靠性及WAL
 
Linux Cluster next generation
Linux Cluster next generationLinux Cluster next generation
Linux Cluster next generation
 
Cluster pitfalls recommand
Cluster pitfalls recommandCluster pitfalls recommand
Cluster pitfalls recommand
 
MAXATMA GANDHI
MAXATMA GANDHIMAXATMA GANDHI
MAXATMA GANDHI
 
νοτια αμερικη.
νοτια αμερικη.νοτια αμερικη.
νοτια αμερικη.
 
El burro de sancho
El burro de sanchoEl burro de sancho
El burro de sancho
 
Network Conference LMS Big Data Final 1.24.14
Network Conference LMS Big Data Final 1.24.14Network Conference LMS Big Data Final 1.24.14
Network Conference LMS Big Data Final 1.24.14
 
Corosync and Pacemaker
Corosync and PacemakerCorosync and Pacemaker
Corosync and Pacemaker
 
Totem y tabu
Totem y tabuTotem y tabu
Totem y tabu
 
iSCSI introduction and usage
iSCSI introduction and usageiSCSI introduction and usage
iSCSI introduction and usage
 
Tótem y Tabú
Tótem y TabúTótem y Tabú
Tótem y Tabú
 
Tabu
TabuTabu
Tabu
 
Home care colombian supermarket follow up v2
Home care colombian supermarket follow up v2Home care colombian supermarket follow up v2
Home care colombian supermarket follow up v2
 
Open stack HA - Theory to Reality
Open stack HA -  Theory to RealityOpen stack HA -  Theory to Reality
Open stack HA - Theory to Reality
 
Commercial Passive House Case Studies
Commercial Passive House Case StudiesCommercial Passive House Case Studies
Commercial Passive House Case Studies
 
Totem Pole PowerPoint
Totem Pole PowerPointTotem Pole PowerPoint
Totem Pole PowerPoint
 

Similar to Totem

Presentation
PresentationPresentation
PresentationLior Boim
 
Ch3 transport layer Network
Ch3 transport layer NetworkCh3 transport layer Network
Ch3 transport layer Networkcairo university
 
Mobile Transpot Layer
Mobile Transpot LayerMobile Transpot Layer
Mobile Transpot LayerMaulik Patel
 
Tcp Ip Overview
Tcp Ip OverviewTcp Ip Overview
Tcp Ip OverviewAmir Malik
 
Adhoc and Sensor Networks - Chapter 07
Adhoc and Sensor Networks - Chapter 07Adhoc and Sensor Networks - Chapter 07
Adhoc and Sensor Networks - Chapter 07Ali Habeeb
 
Transport Layer in Computer Networks (TCP / UDP / SCTP)
Transport Layer in Computer Networks (TCP / UDP / SCTP)Transport Layer in Computer Networks (TCP / UDP / SCTP)
Transport Layer in Computer Networks (TCP / UDP / SCTP)Hamidreza Bolhasani
 
5-LEC- 5.pptxTransport Layer. Transport Layer Protocols
5-LEC- 5.pptxTransport Layer.  Transport Layer Protocols5-LEC- 5.pptxTransport Layer.  Transport Layer Protocols
5-LEC- 5.pptxTransport Layer. Transport Layer ProtocolsZahouAmel1
 
Transport_Layer_Protocols.pptx
Transport_Layer_Protocols.pptxTransport_Layer_Protocols.pptx
Transport_Layer_Protocols.pptxAnkitKumar891632
 
CNS_Module-2-ppt.pptx
CNS_Module-2-ppt.pptxCNS_Module-2-ppt.pptx
CNS_Module-2-ppt.pptxHIMANKMISHRA2
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPDilum Bandara
 
Analytical Research of TCP Variants in Terms of Maximum Throughput
Analytical Research of TCP Variants in Terms of Maximum ThroughputAnalytical Research of TCP Variants in Terms of Maximum Throughput
Analytical Research of TCP Variants in Terms of Maximum ThroughputIJLT EMAS
 
Huawei_HCNA_Routing_and_Switching.pdf
Huawei_HCNA_Routing_and_Switching.pdfHuawei_HCNA_Routing_and_Switching.pdf
Huawei_HCNA_Routing_and_Switching.pdfPauloDiniz60
 
9-Lect_9-2.pptx DataLink Layer DataLink Layer
9-Lect_9-2.pptx DataLink Layer DataLink Layer9-Lect_9-2.pptx DataLink Layer DataLink Layer
9-Lect_9-2.pptx DataLink Layer DataLink LayerZahouAmel1
 
Transport Layer Services : Multiplexing And Demultiplexing
Transport Layer Services : Multiplexing And DemultiplexingTransport Layer Services : Multiplexing And Demultiplexing
Transport Layer Services : Multiplexing And DemultiplexingKeyur Vadodariya
 
Tugas komjar 7-yee
Tugas komjar 7-yeeTugas komjar 7-yee
Tugas komjar 7-yeeramasatriaf
 

Similar to Totem (20)

Presentation
PresentationPresentation
Presentation
 
Ch3 transport layer Network
Ch3 transport layer NetworkCh3 transport layer Network
Ch3 transport layer Network
 
Week4 lec1-bscs1
Week4 lec1-bscs1Week4 lec1-bscs1
Week4 lec1-bscs1
 
Mobile Transpot Layer
Mobile Transpot LayerMobile Transpot Layer
Mobile Transpot Layer
 
Tcp Ip Overview
Tcp Ip OverviewTcp Ip Overview
Tcp Ip Overview
 
Lec6
Lec6Lec6
Lec6
 
Adhoc and Sensor Networks - Chapter 07
Adhoc and Sensor Networks - Chapter 07Adhoc and Sensor Networks - Chapter 07
Adhoc and Sensor Networks - Chapter 07
 
Transport Layer in Computer Networks (TCP / UDP / SCTP)
Transport Layer in Computer Networks (TCP / UDP / SCTP)Transport Layer in Computer Networks (TCP / UDP / SCTP)
Transport Layer in Computer Networks (TCP / UDP / SCTP)
 
5-LEC- 5.pptxTransport Layer. Transport Layer Protocols
5-LEC- 5.pptxTransport Layer.  Transport Layer Protocols5-LEC- 5.pptxTransport Layer.  Transport Layer Protocols
5-LEC- 5.pptxTransport Layer. Transport Layer Protocols
 
transport layer
transport layertransport layer
transport layer
 
Transport_Layer_Protocols.pptx
Transport_Layer_Protocols.pptxTransport_Layer_Protocols.pptx
Transport_Layer_Protocols.pptx
 
Mod4
Mod4Mod4
Mod4
 
CNS_Module-2-ppt.pptx
CNS_Module-2-ppt.pptxCNS_Module-2-ppt.pptx
CNS_Module-2-ppt.pptx
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
 
Analytical Research of TCP Variants in Terms of Maximum Throughput
Analytical Research of TCP Variants in Terms of Maximum ThroughputAnalytical Research of TCP Variants in Terms of Maximum Throughput
Analytical Research of TCP Variants in Terms of Maximum Throughput
 
Lte imp
Lte impLte imp
Lte imp
 
Huawei_HCNA_Routing_and_Switching.pdf
Huawei_HCNA_Routing_and_Switching.pdfHuawei_HCNA_Routing_and_Switching.pdf
Huawei_HCNA_Routing_and_Switching.pdf
 
9-Lect_9-2.pptx DataLink Layer DataLink Layer
9-Lect_9-2.pptx DataLink Layer DataLink Layer9-Lect_9-2.pptx DataLink Layer DataLink Layer
9-Lect_9-2.pptx DataLink Layer DataLink Layer
 
Transport Layer Services : Multiplexing And Demultiplexing
Transport Layer Services : Multiplexing And DemultiplexingTransport Layer Services : Multiplexing And Demultiplexing
Transport Layer Services : Multiplexing And Demultiplexing
 
Tugas komjar 7-yee
Tugas komjar 7-yeeTugas komjar 7-yee
Tugas komjar 7-yee
 

Recently uploaded

Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 

Recently uploaded (20)

Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 

Totem

  • 2. Guideline Introduction Totem Single Ring Protocol Totem Redundant Ring Protocol
  • 3. Introduction  SRP: The Totem Single-Ring Ordering and Membership Protocol  Supports high-performance fault-tolerance distributed systems that continue to operate despite network partitioning and remerging, and processors fail and restart.  Provides totally ordered message delivery with low overhead, high throughput and low latency using a logical token-passing ring.  Provides rapid detection of network partitioning and processor failure together with reconfiguration and membership services  RRP: The Totem Redundant Ring Protocol  Based on SRP(You can think this layer modified recv/send of SRP)  Make it more reliable even a node if offline by configuring an extra network interface
  • 4. Introduction  Processor: A corosync node which is member in the CPG(Closed Process Group)  Application: Programs that uses corosync to communicate, for example pacemaker, dlm, sheepdog
  • 6. Introduction  Broadcast: One Processor => all Processors  Transmit/Forward token: One Processor => next Processor  Delivery: One Processor => associated Application
  • 7. Reliable Ordered Delivery Services Reliable Delivery for Configuration C each message has unique identifier if processor p delivers message m, p delivers m once only. If p delivers two different messages, the p delivers 1 of those messages strictly before delivers the other. if p originates message m, then p will deliver m or will fail before delivering a Configuration Change message to install a new regular configuration if p is a member of regular configuration C, and no configuration change occurs, then p will deliver in C all the messages originated in C if p delivers message m originated in C, then p is a member of C if p and q are both members of configurations C1 and C2 then p and q deliver the same set of messages in C1 before delivering a Configuration Change message that terminates C1 and starts C2.
  • 8. Reliable Ordered Delivery Services  Delivery in Causal Order delivery order should respect Lamport causality within a configuration  Delivery in Agreed Order guarantees that processors deliver messages in a consistent total order. When a processor delivers a message, it has delivered all preceding messages in the same total order  Delivery in Safe Order When processor delivers a message, it has determined that every processor in the current configuration has received the message and will deliver that message unless that processor fails.
  • 9. Totem Single Ring Protocol
  • 10. Totem Single Ring Protocol  The Totem Ordering Protocol (OP)  The Membership Protocol (MP)  The Recovery Protocol (RP)
  • 11. The Totem Ordering Protocol Under Operational state To ensure messages are delivered to Application in Agreed Order or Safe Order The Application can specify in Agreed Order or Safe Order A processor uses token to deliver messages on total order one by one
  • 12. An Example P2 P3 P4 P1 M3M2M1  A1 asks P1 to deliver 3 pieces of message: M1, M2, M3(in P1’ s request queue)  Suppose that P1 has got the token, it will transmit:M1, M2, M3  Of course P1 will save in its receive queue A1 M3M2M1
  • 13. An Example P2 P3 P4 P1  P2 only received M2M1, while P3 and P4 received M3M2M1  P1 transmits Token to P2,in the Token, seq indicates the max seq is 3 in P1 ‘s receive queue  P2 will compare seq with its messages, and find M3 got lost。 A1 M3M2M1 Recv: M2M1 Recv: M3M2M1 Recv: M3M2M1 Token seq:3 aru:3 aru_id:P1 rtr: Recv: M3M2M1
  • 14. An Example P2 P3 P4 P1  P2 updates aru(all-received-up-to) to 2 in the token, and set rtr to 2  Then transmits the token to P3  On receiving the token, P3 will broadcast M3 to the cluster  After clearing rtr, P3 will transmit the token to P4 A1 M3M2M1 Recv: M2M1 Recv: M3M2M1 Recv: M3M2M1 Token seq:3 aru:2 aru_id:P2 rtr:3 Recv: M3M2M1 M3
  • 15. An Example P2 P3 P4 P1  P2 received message M3 broadcast by P3, others will ignore M3  P4 got the token transmit by P3, nothing to do, and transmits the token to P1 A1 M3M2M1 Recv: M3M2M1 Recv: M3M2M1 Recv: M3M2M1 Token seq:3 aru:2 aru_id:P2 rtr: Recv: M3M2M1
  • 16. An Example P2 P3 P4 P1  P1 received the token transmit by P4, nothing to do, and transmits it to P2 A1 M3M2M1 Recv: M3M2M1 Recv: M3M2M1 Recv: M3M2M1 Token seq:3 aru:2 aru_id:P2 rtr: Recv: M3M2M1
  • 17. An Example P2 P3 P4 P1  P1 received the token transmit by P4, nothing to do, and transmits it to P2  P2 finds the aru_id in token is itself, and it already got M3  Then updates aru to 3, and P2 knows that all node has got M3M2M1  P2 transmits the token to P3 A1 M3M2M1 Recv: M3M2M1 Recv: M3M2M1 Recv: M3M2M1 Token seq:3 aru:2 aru_id:P2 rtr: Recv: M3M2M1 If P2 delivers M3M2M1 to application,it is in Safe Order
  • 18. In Agreed/Safe Order?  Agreed Order If the processor got the token delivers messages to the application in order, then the messages are in Agreed Order。  Safe Order If aru in token are greater than a seq in two successive transmits, then the massages are in Safe Order.
  • 19. The Membership Protocol Under the Gather state and Commit state When a new processor joins the cluster or an old processor leaves the cluster, it will form a new Single-Ring
  • 20. An Example: new node join P2 P3 P4 P1 P4 is a new node that joins the cluster The old ring is {P1,P2,P3}, and its seq is 100. For nodes in old ring, my_proc_set stores the member list When P4 joins cluster, it will broadcast a join msg。 Upon P1,P2,P3 receiving the join msg, they enter Gather state sender_id:P4 proc_set: P4 fail_set: ring_seq:xx my_proc_set:P1P2P3 my_proc_set:P1P2P3 my_proc_set:P1P2P3 my_proc_set:P4
  • 21. An Example: new node join P2 P3 P4 P1 When P1,P2,P3 received JoinMsg from P4, they merge the proc_set from JoinMsg into their own my_proc_set P1,P2,P3 will broadcast a new JoinMsg Upon receiving JoinMsg from other nodes, every node will compare proc_set in JoinMsg and my_proc_set, and mark consensus if they are the same sender_id:P1 proc_set: P[1-4] fail_set: ring_seq:xmy_proc_set:P1P2P3P4 my_proc_set:P1P2P3P4 my_proc_set:P1P2P3P4 my_proc_set:P4 sender_id:P2 proc_set: P[1-4] fail_set: ring_seq:x sender_id:P3 proc_set: P[1-4] fail_set: ring_seq:x
  • 22. An Example: new node join P2 P3 P4 P1 When a node find all members in its my_proc_set reached consensus,if the node has minimum id, it will send Commit Token and enter commit state, CommitToken’s ring_id.seq = max(old ring_id and JoinMsg’s ring_id) + 4 Based on the above slide, after serveral times, we suppose P1,P3,P4 reached consensus P2 did not receive message from P3, in P2’s consensus list, consensus[P3]=false。 my_proc_set:P1P2P3P4 consensus[All]=true my_proc_set:P1P2P3P4 consensus[P3]=false consensu[P1,2,4]=true my_proc_set:P1P2P3P4 Consensus[All]=true my_proc_set:P1P2P3P4 consensus[All]=true Commit Token ring_id: 104/p1 memb_list:{P1} memb_idx:P1 P1 has the minimum id,and transmits commit token;but token is discard by P2,which trigers token loss,and re- send JoinMsg P2 my_proc_set did not reach consensus,discard commit token,will triger consensus timeout and re- send JoinMsg memb: { P1, old ring_id, old my_aru, high_delivered, received_flg }
  • 23. An Example: new node join P2 P3 P4 P1 The normal situation After serveral times of receiving and sending JoinMsg, all Processors’ my_proc_set are marked as consensus。 P2 received Commit Token from P1, updates memb_list and memb_idx, then transmits the Commit Token,and enters Commit state Commit Token ring_id: 104/p1 memb_list:{P1,P2} memb_idx:P2
  • 24. An Example: new node join P2 P3 P4 P1 P3 received Commit Token from P2, updates memb_list and memb_idx, then transmits the Commit Token,and enters Commit state Commit Token ring_id: 104/P1 memb_list:{P1,P2,P3} memb_idx:P3
  • 25. An Example: new node join P2 P3 P4 P1 P4 received Commit Token from P3, updates memb_list and memb_idx, then transmits the Commit Token,and enters Commit state Commit Token ring_id: 104/P1 memb_list:{P1,P2,P3,P4} memb_idx:P4
  • 26. An Example: new node join P2 P3 P4 P1 P1 received the Commit Token from P4,as P1 is in Commit state, P1 knows that all members are in Commit state P1 transmits the Commit Token again, and enters Recovery state, and set the ring_id (my_ring_id=CommitToken’s ring_id) state: Recovery my_ring_id: 104/P1 my_new_memb: {P1,P2,P3,P4} my_trans_memb: {P1,P2,P3} … Commit Token ring_id: 104/P1 memb_list:{P1,P2,P3,P4} memb_idx:P1 state: commit my_ring_id: 100/P1 my_new_memb: {} my_trans_memb: {} … state: commit my_ring_id: 100/P1 my_new_memb: {} my_trans_memb: {} … state: commit my_ring_id: 100/P1 my_new_memb: {} my_trans_memb: {} …
  • 27. An Example: new node join P2 transmits the Commit Token again, and enters Recovery state, and set the ring_id (my_ring_id=CommitToken’s ring_id) state: Recovery my_ring_id: 104/P1 my_new_memb: {P1,P2,P3,P4} my_trans_memb: {P1,P2,P3} … P2 P3 P4 P1 state: Recovery my_ring_id: 104/P1 my_new_memb: {P1,P2,P3,P4} my_trans_memb: {P1,P2,P3} … Commit Token ring_id: 104/P1 memb_list:{P1,P2,P3,P4} memb_idx:P2 state: commit my_ring_id: 100/P1 my_new_memb: {} my_trans_memb: {} … state: commit my_ring_id: 100/P1 my_new_memb: {} my_trans_memb: {} …
  • 28. An Example: new node join P3 transmits the Commit Token again, and enters Recovery state, and set the ring_id (my_ring_id=CommitToken’s ring_id) P2 P3 P4 P1 state: Recovery my_ring_id: 104/P1 my_new_memb: {P1,P2,P3,P4} my_trans_memb: {P1,P2,P3} … Commit Token ring_id: 104/P1 memb_list:{P1,P2,P3,P4} memb_idx:P3 state: Commit my_ring_id: 100/P1 my_new_memb: {} my_trans_memb: {} … state: Recovery my_ring_id: 104/P1 my_new_memb: {P1,P2,P3,P4} my_trans_memb: {P1,P2,P3} … state: Recovery my_ring_id: 104/P1 my_new_memb: {P1,P2,P3,P4} my_trans_memb: {P1,P2,P3} …
  • 29. An Example: new node join P4 transmits the Commit Token again, and enters Recovery state, and set the ring_id (my_ring_id=CommitToken’s ring_id) As P4 is a new member, there is only itself in its my_trans_memb When P1 received the Commit Token the 3rd time, every node reached Reovery state state: Recovery my_ring_id: 104/P1 my_new_memb: {P1,P2,P3,P4} my_trans_memb: {P4} … P2 P3 P4 P1state: Recovery my_ring_id: 104/P1 my_new_memb: {P1,P2,P3,P4} my_trans_memb: {P1,P2,P3} … Commit Token ring_id: 104/P1 memb_list:{P1,P2,P3,P4} memb_idx:P4 state: Recovery my_ring_id: 104/P1 my_new_memb: {P1,P2,P3,P4} my_trans_memb: {P1,P2,P3} … state: Recovery my_ring_id: 104/P1 my_new_memb: {P1,P2,P3,P4} my_trans_memb: {P1,P2,P3} …
  • 30. The Recovery Protocol Under Recovery state Transition from Old Ring to New Ring,recovery message from Old Ring to make the messages in Agreed order or Safe Order In Rcovery state, messages delivered by Application to the New Ring can not be broadcast(must be in Operational state)
  • 31. The Recovery Protocol  Step1: Exchange messages with other processors those are from the same Old Ring(similar to Operational State) Notice: there might be multiple Old Rings in one New Ring  Step2: Deliver the messages those in Agreed/Safe Order to the Application in the Old configuration(message.seq<=high_ring_delivered)
  • 32. The Recovery Protocol  Step3: Deliver the 1st ConfingChange Msg(Transitional Configuration) to the Application The 1st ConfingChange Msg contains member list of the Old Ring that belong to the New Ring.  Step4: Deliver messages(in Transitional Configuration) those are in Agreed/Safe Order to the Application
  • 33. The Recovery Protocol  Step5: Deliver the 2nd ConfingChange Msg(New Configuration) to the Application The 2nd ConfingChange Msg contains member list of the New Ring  Step6: Enter Operational State from Recovery State Step2-Step6 does not need to exchange messages with other Processors , it is an atomic operation
  • 34. Totem Redundant Ring Protocol  Based on SRP(You can think this layer modified recv/send of SRP)  Make it more reliable even a node if offline by configuring an extra network interface
  • 35. Totem Redundant Ring Protocol  Active replication All messages are transmit by N channels Every message is received N times The more channel(larger N), the higher bandwidth cost for a Processor  Passive replication Every message is transmit by 1 of the N channels Every message is received N times The bandwidth is the same with Single-Ring for a Processor  Active-passive replication The mixture of Active and passive, all messages are transmit by K channels(1<K<N)
  • 36. Q&A