SlideShare a Scribd company logo
1 of 28
Download to read offline
Gossip protocol and applications
Tu Nguyen
Staff Software Engineer - Axon
Gossip protocol
Gossip in computer science
A peer-to-peer communication protocol●
Inspired by epidemics, human gossip and social networks (spreading rumors)●
epidemic protocol (synonym)■
why ?■
rumors or epidemics in society travel at a great speed and reach to almost every member of the community
without needing a central coordinator.
●
Gossip was founded originally to solve Multicast problem●
Multicast●
we want to communicate a message to all the nodes in the network■
each node sends the message to only a few of the nodes■
Multicast problems ?●
Fault-tolerance: node might crash, packet might be dropped, etc○
Scalability: millions, hundreds of millions of nodes○
Centralized: single sender “multi-cast” TCP/UDP packets to others.○
Tree-based multicast: too much redundancy with ACK/NACK msg.○
Multicast was originally heavily used in network devices (eg. routers); how to leverage it in application layer ?○
Gossip basic
A node wants to share some information to the other nodes in the network. Then periodically it
selects randomly a node from the set of nodes and exchanges the information. The node that
receives the information does exactly the same thing.
Cycle●
number of rounds to spread the information■
Fanout●
number of nodes that a node “gossip” within each cycle■
Gossip properties
Node selection must be random (or guarantee enough peer diversity)●
Node only stores local information. There is no shared global state.●
Communication is round-based (periodic).●
Transmission and processing capacity per round is limited.●
All nodes run the same protocol.●
Not deterministic (because of randomness peer sampling).●
Advantages of Gossip
Scalable●
Fault-tolerance●
Robust●
Decentralized●
Convergent consistency●
Gossip modeling
Consider a distributed network where nodes are message-passing to each
other.
State of a node●
Susceptible - node has not received update yet (is not infected).■
Infected - node with an update it is willing to share.■
Removed - node has received the update but is not willing to share.■
Two basic models●
SI (anti-entropy)■
SIR (rumor-mongering)■
When R state happens ?
👉 Many algorithms. One of them are counting for redundant messages.
Gossip modeling
Push / Pull / Push-Pull●
Push■
I nodes are the ones sending/infecting S nodes●
efficient when there are a few updates.●
Pull■
all nodes are actively pulling for updates●
efficient when there are many updates.●
Push-Pull■
node pushes when it has updates and also pulls for new updates●
node and selected node are exchanging information ●
Gossip modeling
https://flopezluis.github.io/gossip-simulator/
Gossip Applications
Applications
Cluster membership●
Information dissemination●
Failure detection●
Database replication●
Overlay network●
Aggregations●
Cluster Membership
 Who are my live peers ?
Desired properties
Connectedness●
Balance●
Short path-length●
Reducing redundancy●
Scalability●
Accuracy●
Full Partial
Full Partial
👍 Connectedness
👍 Short-path length
👌 Accuracy
👌 Balance
👎 High redundancy
👎 Low scalability
👌 Connectedness
👌 Short-path length
👌 Accuracy
👌 Balance
👍 Low redundancy
👍 High scalability
Cluster Membership
✅
SWIM - Cornell University 2002●
SCAMP - Microsoft Research 2003●
CYCLON - Vrije University, The Netherlands, 2005●
HYPARVIEW - University of Lisbon, 2007●
Cluster Membership
SWIM - Cornell university (2002)
Scalable Weakly-consistent Infection-style Process Group
Membership
https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf
Properties
Scalable●
Weakly consistent●
Infection-style●
Membership protocol●
SWIM
Motivated by traditional heart-beating●
every interval T, notify peers of liveness■
if no update received from peer P after T * limit, mark P as dead.■
heart-beat = membership + failure detection■
Heart-beat is doing good at:●
completeness - yes!■
strong completeness - every crashed node is eventually detected by all correct
nodes.
●
Accuracy - high!■
Heart-beat problems ?●
Network load: N^2■
SWIM is trying to ...
Separate two problems and solve them one-by-one●
Failure detection (👉 “live” peers)○
Membership protocol (👉 list of peers)○
Optimization●
Reduce network load○
Failure detection○
decrease processing time●
increase accuracy●
Failure Detection properties
One step back...●
The two properties of a distributed system□
Safety - nothing bad ever happens○
Liveness - something good eventually happens.○
Failure Detection properties●
Completeness (L) - failure detector would find the node(s) that finally crashed in the
system. 
□
Accuracy (S) - correct decisions that the failure detector has made in a node.□
Failure Detection properties
Degree of completeness●
depends on number of crashed nodes is suspected by a failure detector in a certain
period
□
Strong completeness - every faulty node is eventually permanently suspected by every non-
faulty node
○
Weak completeness - every faulty node is eventually permanently suspected by some non-faulty
node
○
Degree of accuracy●
depends on number of mistakes that a failure detector made in certain period□
Strong accuracy - no node is suspected (by any node) before it crashes○
Weak accuracy - some non-faulty node is never suspected○
Eventual strong accuracy - after some time, system becomes strong accuracy.○
Eventual weak accuracy - after some time, system becomes weak accuracy.○
SWIM Failure Detection
Each node in set of N node●
Choose a random peer○
Ping - ACK□
Indirect Ping (iff no ACK)○
Choose k random peers□
indirect Ping○
Evaluation:
completeness: every nodes will be pinged!●
accuracy: “high” (🔍)●
speed of detection: 1 * Interval●
network load: (4*k + 2) * N ~ 0(N)●
SWIM Membership Protocol
Aware of join / leave nodes●
Motivated by Gossip●
Piggy-back approach■
Infection-style○
ping is sent to random peer□
eventually (weakly) consistent□
updates send peer-to-peer□
SWIM - Optimization
Suspicion state - to improve accuracy
Trade-off between failure detection time and false positives.●
Introduce suspicion state.●
A 👉 B: Ping! Suspect C failed■
B 👉 A: ACK!■
A few moment later■
A, B 👉 C: Ping! Are you dead ?□
C 👉 A,B: ACK! (i’m not 😋)□
State FSM
SWIM - Optimization
Round-robin probe peer selection
Randomly sort peer set■
Ping in round-robin order■
Evaluation:
Completeness: increase, time-bounded○
State FSM
SWIM - Limitations
Node leave vs fail●
Re-joining●
Event ordering●
Message encryption●
Peer metadata●
Custom payload●
Network participants●
More details:  https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf
SWIM - Implementation
memberlist https://github.com/hashicorp/memberlist●
serf, consul, etcd are relying on swim-based memberlist for failure detection and group
membership.
●
Other “announced” applications
Cassandra internal - understand gossip https://www.youtube.com/watch?v=FuP1Fvrv6ZQ●
AWS S3 gossip http://status.aws.amazon.com/s3-20080720.html●
Slicing structured overlay network
T-MAN  https://www.researchgate.net/publication/225403352_T-Man_Gossip-
Based_Overlay_Topology_Management
●
https://managementfromscratch.wordpress.com/2016/04/01/introduction-to-gossip●

More Related Content

What's hot

Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compilerGrokking VN
 
Architecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesArchitecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesLINE Corporation
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systemsKlika Tech, Inc
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservicespflueras
 
Dual write strategies for microservices
Dual write strategies for microservicesDual write strategies for microservices
Dual write strategies for microservicesBilgin Ibryam
 
High Concurrency Architecture at TIKI
High Concurrency Architecture at TIKIHigh Concurrency Architecture at TIKI
High Concurrency Architecture at TIKINghia Minh
 
High performance network programming on the jvm oscon 2012
High performance network programming on the jvm   oscon 2012 High performance network programming on the jvm   oscon 2012
High performance network programming on the jvm oscon 2012 Erik Onnen
 
Présentation de Apache Zookeeper
Présentation de Apache ZookeeperPrésentation de Apache Zookeeper
Présentation de Apache ZookeeperMichaël Morello
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Thiết kế hệ thống E-Commerce yêu cầu mở rộng
Thiết kế hệ thống E-Commerce yêu cầu mở rộngThiết kế hệ thống E-Commerce yêu cầu mở rộng
Thiết kế hệ thống E-Commerce yêu cầu mở rộngNguyen Minh Quang
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016DataStax
 
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...HostedbyConfluent
 
Ame 2269 ibm mq high availability
Ame 2269 ibm mq high availabilityAme 2269 ibm mq high availability
Ame 2269 ibm mq high availabilityAndrew Schofield
 
Zookeeper Architecture
Zookeeper ArchitectureZookeeper Architecture
Zookeeper ArchitecturePrasad Wali
 
SOLID & Design Patterns
SOLID & Design PatternsSOLID & Design Patterns
SOLID & Design PatternsGrokking VN
 
Building Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons LearnedBuilding Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons Learnedparallellabs
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problemGrokking VN
 
Introducing Saga Pattern in Microservices with Spring Statemachine
Introducing Saga Pattern in Microservices with Spring StatemachineIntroducing Saga Pattern in Microservices with Spring Statemachine
Introducing Saga Pattern in Microservices with Spring StatemachineVMware Tanzu
 

What's hot (20)

Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compiler
 
Architecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesArchitecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker services
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 
Dual write strategies for microservices
Dual write strategies for microservicesDual write strategies for microservices
Dual write strategies for microservices
 
High Concurrency Architecture at TIKI
High Concurrency Architecture at TIKIHigh Concurrency Architecture at TIKI
High Concurrency Architecture at TIKI
 
High performance network programming on the jvm oscon 2012
High performance network programming on the jvm   oscon 2012 High performance network programming on the jvm   oscon 2012
High performance network programming on the jvm oscon 2012
 
Présentation de Apache Zookeeper
Présentation de Apache ZookeeperPrésentation de Apache Zookeeper
Présentation de Apache Zookeeper
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
CAP Theorem
CAP TheoremCAP Theorem
CAP Theorem
 
Thiết kế hệ thống E-Commerce yêu cầu mở rộng
Thiết kế hệ thống E-Commerce yêu cầu mở rộngThiết kế hệ thống E-Commerce yêu cầu mở rộng
Thiết kế hệ thống E-Commerce yêu cầu mở rộng
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
 
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
 
Ame 2269 ibm mq high availability
Ame 2269 ibm mq high availabilityAme 2269 ibm mq high availability
Ame 2269 ibm mq high availability
 
Zookeeper Architecture
Zookeeper ArchitectureZookeeper Architecture
Zookeeper Architecture
 
SOLID & Design Patterns
SOLID & Design PatternsSOLID & Design Patterns
SOLID & Design Patterns
 
Building Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons LearnedBuilding Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons Learned
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problem
 
Introducing Saga Pattern in Microservices with Spring Statemachine
Introducing Saga Pattern in Microservices with Spring StatemachineIntroducing Saga Pattern in Microservices with Spring Statemachine
Introducing Saga Pattern in Microservices with Spring Statemachine
 

Similar to Grokking Techtalk #39: Gossip protocol and applications

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Module: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconModule: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconIoannis Psaras
 
BSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfBSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfStevenJoeBiago
 
Ake hedman why we need to unite and why vscp is a solution to a problem
Ake hedman  why we need to unite and why vscp is a solution to a problemAke hedman  why we need to unite and why vscp is a solution to a problem
Ake hedman why we need to unite and why vscp is a solution to a problemWithTheBest
 
Iot with-the-best & VSCP
Iot with-the-best & VSCPIot with-the-best & VSCP
Iot with-the-best & VSCPAke Hedman
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesDefCamp
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesPriyanka Aash
 
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Ontico
 
Overcoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for PerformanceOvercoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for PerformanceScyllaDB
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)NYversity
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introductionShehaaz Saif
 
Ple18 web-security-david-busby
Ple18 web-security-david-busbyPle18 web-security-david-busby
Ple18 web-security-david-busbyDavid Busby, CISSP
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper diveRobert Kubiś
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world dataAthira Mukundan
 
introduction to advanced distributed system
introduction to advanced distributed systemintroduction to advanced distributed system
introduction to advanced distributed systemmilkesa13
 
Computer network (8)
Computer network (8)Computer network (8)
Computer network (8)NYversity
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemVarad Meru
 
IT Security Basics For Managers
IT Security Basics For ManagersIT Security Basics For Managers
IT Security Basics For ManagersDaniel Owens
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingAmuhinda Hungai
 
SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)Mohammad Awais Javaid
 

Similar to Grokking Techtalk #39: Gossip protocol and applications (20)

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Module: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness BeaconModule: drand - the Distributed Randomness Beacon
Module: drand - the Distributed Randomness Beacon
 
BSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfBSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdf
 
Ake hedman why we need to unite and why vscp is a solution to a problem
Ake hedman  why we need to unite and why vscp is a solution to a problemAke hedman  why we need to unite and why vscp is a solution to a problem
Ake hedman why we need to unite and why vscp is a solution to a problem
 
Iot with-the-best & VSCP
Iot with-the-best & VSCPIot with-the-best & VSCP
Iot with-the-best & VSCP
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
 
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
 
Overcoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for PerformanceOvercoming Variable Payloads to Optimize for Performance
Overcoming Variable Payloads to Optimize for Performance
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
 
Ple18 web-security-david-busby
Ple18 web-security-david-busbyPle18 web-security-david-busby
Ple18 web-security-david-busby
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
 
introduction to advanced distributed system
introduction to advanced distributed systemintroduction to advanced distributed system
introduction to advanced distributed system
 
Computer network (8)
Computer network (8)Computer network (8)
Computer network (8)
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
IT Security Basics For Managers
IT Security Basics For ManagersIT Security Basics For Managers
IT Security Basics For Managers
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)SNMP(Simple Network Management Protocol)
SNMP(Simple Network Management Protocol)
 

More from Grokking VN

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking VN
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking VN
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking VN
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking VN
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking VN
 
Grokking Techtalk #37: Software design and refactoring
 Grokking Techtalk #37: Software design and refactoring Grokking Techtalk #37: Software design and refactoring
Grokking Techtalk #37: Software design and refactoringGrokking VN
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking VN
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...Grokking VN
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking VN
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking VN
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking VN
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking VN
 
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking VN
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking VN
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking VN
 
Grokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking VN
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking VN
 

More from Grokking VN (20)

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles Thinking
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystified
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
 
Grokking Techtalk #37: Software design and refactoring
 Grokking Techtalk #37: Software design and refactoring Grokking Techtalk #37: Software design and refactoring
Grokking Techtalk #37: Software design and refactoring
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellchecking
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous Communications
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search Tree
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the Magic
 
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platformGrokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #26: Compare ios and android platform
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocols
 
Grokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer VisionGrokking TechTalk #21: Deep Learning in Computer Vision
Grokking TechTalk #21: Deep Learning in Computer Vision
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101
 

Recently uploaded

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 

Recently uploaded (20)

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 

Grokking Techtalk #39: Gossip protocol and applications

  • 1. Gossip protocol and applications Tu Nguyen Staff Software Engineer - Axon
  • 3.
  • 4. Gossip in computer science A peer-to-peer communication protocol● Inspired by epidemics, human gossip and social networks (spreading rumors)● epidemic protocol (synonym)■ why ?■ rumors or epidemics in society travel at a great speed and reach to almost every member of the community without needing a central coordinator. ● Gossip was founded originally to solve Multicast problem● Multicast● we want to communicate a message to all the nodes in the network■ each node sends the message to only a few of the nodes■ Multicast problems ?● Fault-tolerance: node might crash, packet might be dropped, etc○ Scalability: millions, hundreds of millions of nodes○ Centralized: single sender “multi-cast” TCP/UDP packets to others.○ Tree-based multicast: too much redundancy with ACK/NACK msg.○ Multicast was originally heavily used in network devices (eg. routers); how to leverage it in application layer ?○
  • 5. Gossip basic A node wants to share some information to the other nodes in the network. Then periodically it selects randomly a node from the set of nodes and exchanges the information. The node that receives the information does exactly the same thing. Cycle● number of rounds to spread the information■ Fanout● number of nodes that a node “gossip” within each cycle■
  • 6. Gossip properties Node selection must be random (or guarantee enough peer diversity)● Node only stores local information. There is no shared global state.● Communication is round-based (periodic).● Transmission and processing capacity per round is limited.● All nodes run the same protocol.● Not deterministic (because of randomness peer sampling).●
  • 8. Gossip modeling Consider a distributed network where nodes are message-passing to each other. State of a node● Susceptible - node has not received update yet (is not infected).■ Infected - node with an update it is willing to share.■ Removed - node has received the update but is not willing to share.■ Two basic models● SI (anti-entropy)■ SIR (rumor-mongering)■ When R state happens ? 👉 Many algorithms. One of them are counting for redundant messages.
  • 9. Gossip modeling Push / Pull / Push-Pull● Push■ I nodes are the ones sending/infecting S nodes● efficient when there are a few updates.● Pull■ all nodes are actively pulling for updates● efficient when there are many updates.● Push-Pull■ node pushes when it has updates and also pulls for new updates● node and selected node are exchanging information ●
  • 13. Applications Cluster membership● Information dissemination● Failure detection● Database replication● Overlay network● Aggregations●
  • 14. Cluster Membership  Who are my live peers ? Desired properties Connectedness● Balance● Short path-length● Reducing redundancy● Scalability● Accuracy● Full Partial
  • 15. Full Partial 👍 Connectedness 👍 Short-path length 👌 Accuracy 👌 Balance 👎 High redundancy 👎 Low scalability 👌 Connectedness 👌 Short-path length 👌 Accuracy 👌 Balance 👍 Low redundancy 👍 High scalability Cluster Membership ✅
  • 16. SWIM - Cornell University 2002● SCAMP - Microsoft Research 2003● CYCLON - Vrije University, The Netherlands, 2005● HYPARVIEW - University of Lisbon, 2007● Cluster Membership
  • 17. SWIM - Cornell university (2002) Scalable Weakly-consistent Infection-style Process Group Membership https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf Properties Scalable● Weakly consistent● Infection-style● Membership protocol●
  • 18. SWIM Motivated by traditional heart-beating● every interval T, notify peers of liveness■ if no update received from peer P after T * limit, mark P as dead.■ heart-beat = membership + failure detection■ Heart-beat is doing good at:● completeness - yes!■ strong completeness - every crashed node is eventually detected by all correct nodes. ● Accuracy - high!■ Heart-beat problems ?● Network load: N^2■
  • 19. SWIM is trying to ... Separate two problems and solve them one-by-one● Failure detection (👉 “live” peers)○ Membership protocol (👉 list of peers)○ Optimization● Reduce network load○ Failure detection○ decrease processing time● increase accuracy●
  • 20. Failure Detection properties One step back...● The two properties of a distributed system□ Safety - nothing bad ever happens○ Liveness - something good eventually happens.○ Failure Detection properties● Completeness (L) - failure detector would find the node(s) that finally crashed in the system.  □ Accuracy (S) - correct decisions that the failure detector has made in a node.□
  • 21. Failure Detection properties Degree of completeness● depends on number of crashed nodes is suspected by a failure detector in a certain period □ Strong completeness - every faulty node is eventually permanently suspected by every non- faulty node ○ Weak completeness - every faulty node is eventually permanently suspected by some non-faulty node ○ Degree of accuracy● depends on number of mistakes that a failure detector made in certain period□ Strong accuracy - no node is suspected (by any node) before it crashes○ Weak accuracy - some non-faulty node is never suspected○ Eventual strong accuracy - after some time, system becomes strong accuracy.○ Eventual weak accuracy - after some time, system becomes weak accuracy.○
  • 22. SWIM Failure Detection Each node in set of N node● Choose a random peer○ Ping - ACK□ Indirect Ping (iff no ACK)○ Choose k random peers□ indirect Ping○ Evaluation: completeness: every nodes will be pinged!● accuracy: “high” (🔍)● speed of detection: 1 * Interval● network load: (4*k + 2) * N ~ 0(N)●
  • 23. SWIM Membership Protocol Aware of join / leave nodes● Motivated by Gossip● Piggy-back approach■ Infection-style○ ping is sent to random peer□ eventually (weakly) consistent□ updates send peer-to-peer□
  • 24. SWIM - Optimization Suspicion state - to improve accuracy Trade-off between failure detection time and false positives.● Introduce suspicion state.● A 👉 B: Ping! Suspect C failed■ B 👉 A: ACK!■ A few moment later■ A, B 👉 C: Ping! Are you dead ?□ C 👉 A,B: ACK! (i’m not 😋)□ State FSM
  • 25. SWIM - Optimization Round-robin probe peer selection Randomly sort peer set■ Ping in round-robin order■ Evaluation: Completeness: increase, time-bounded○ State FSM
  • 26. SWIM - Limitations Node leave vs fail● Re-joining● Event ordering● Message encryption● Peer metadata● Custom payload● Network participants● More details:  https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf
  • 27. SWIM - Implementation memberlist https://github.com/hashicorp/memberlist● serf, consul, etcd are relying on swim-based memberlist for failure detection and group membership. ●
  • 28. Other “announced” applications Cassandra internal - understand gossip https://www.youtube.com/watch?v=FuP1Fvrv6ZQ● AWS S3 gossip http://status.aws.amazon.com/s3-20080720.html● Slicing structured overlay network T-MAN  https://www.researchgate.net/publication/225403352_T-Man_Gossip- Based_Overlay_Topology_Management ● https://managementfromscratch.wordpress.com/2016/04/01/introduction-to-gossip●