Communication is hard. Something as seemingly straightforward as connecting a few programs across a few sockets is often quite difficult when you’re dealing with real-life situations — so difficult that creating software and services to do so is a multi-billion dollar business.
If you often find yourself looking for better solutions for connecting your processes, sharing data in a simple and effective way, synchronizing threads and improving your IPC game, this webinar is for you. We’ll show you how to overcome the most vexing communication obstacles.
We’ll leverage ZeroMQ, an embeddable networking library that acts like a concurrency framework, and present established communication patterns to solve different tasks, from IPC to P2P and from pub-sub to gossip networks. After a brief introduction to ZeroMQ, we’ll discuss a variety of common communication problems and study potential solutions for each using vivid examples. We’ll also show you how to best integrate these solutions with our Qt codebase.
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Communication Patterns for Distributed Services
1. Integrated Computer Solutions Inc. www.ics.com
Communication Patterns
for Distributed Services
Matteo Brichese
June 25, 2020
2. Integrated Computer Solutions Inc. www.ics.com
Agenda
Introduction
1. The problem with communication
2. What distributed services mean to us?
Communication Patterns
1. 0MQ Quick Intro
2. Reliability, what does it mean to be reliable?
3. The basics: Request-Reply, Pub-Sub
4. RRR aka The Pirate Patterns
5. A P2P protocol
2
3. Integrated Computer Solutions Inc. www.ics.com
The Problem with Communication
● It’s hard
● It’s messy
● It’s centralized
3
4. Integrated Computer Solutions Inc. www.ics.com
What Does Distributed Service Mean to Us?
Wikipedia:
A distributed system is a system whose components are located on
different networked computers, which communicate and coordinate their
actions by passing messages to one another. The components interact with
one another in order to achieve a common goal [...]
A computer program that runs within a distributed system is called a
distributed program.
Is a program that runs between multiple processes a distributed system?
What if the processes are within the same machine?
4
5. Integrated Computer Solutions Inc. www.ics.com
0MQ Quick Intro
● ZMQ looks like an embeddable networking library, but acts like a concurrency framework.
● ZMQ does NOT provide any pre-made solutions
● ZMQ provides basic sockets to build your own communication infrastructure
The basics:
● REQ-REP, which connects a set of clients to a set of services.
● PUB-SUB, which connects a set of publishers to a set of subscribers.
● PUSH-PULL, which connects nodes in a fan-out/fan-in pattern that can have multiple steps and loops.
● PAIR, which connects two sockets exclusively. This is a pattern for connecting two threads in a process.
5
6. Integrated Computer Solutions Inc. www.ics.com
0MQ Quick Intro
During this webinar we will use:
● REQ-REP sockets
● PUB-SUB sockets
● DEALER-ROUTER sockets
A DEALER acts like an asynchronous REQ
A ROUTER acts like an asynchronous REP
6
7. Integrated Computer Solutions Inc. www.ics.com
Defining Reliability
What does it mean to build a reliable communication pattern
Common Failures:
1. Application code
2. Proxy code
3. Overflow
4. Network failure
5. Hardware failure
6. “Exotic” network failures
7. Connection/Datacenter failures
Protecting from 1 to 5 is our goal
7
8. Integrated Computer Solutions Inc. www.ics.com
Request-Reply
Request-Reply, a classic client-server
solution
● A REQ socket connects to a REP socket
● REQ sends a request
● REP replies to the request
● Everything works in lockstep, if REP hangs, REQ
hangs.
Pro: Easy to implement, easy to understand
Cons: Easy to break, not really useful for a
distributed scenario
8
9. Integrated Computer Solutions Inc. www.ics.com
Publisher-Subscriber
9
Publisher-Subscriber
● A SUB socket connects to a PUB socket
● PUB publishes
● SUB receives
Pro: Easy to implement, easy to understand,
perfect to deliver the same content to multiple
services
Cons: late-joiner problem
10. Integrated Computer Solutions Inc. www.ics.com
Reliable Request Reply patterns
How to improve on the Client-Server pattern?
1. Add a timeout on the client receive function
2. Rety the communication or abandon if no reply is received within the allotted
time
This is what’s known as Lazy Pirate Pattern
Note: Reliable Request Reply aka RRR - a pirate inside joke :)
10
11. Integrated Computer Solutions Inc. www.ics.com
Lazy Pirate Pattern
11
Request-Reply with retry
● A REQ socket connects to a REP socket
● REQ sends a request
● REP replies to the request
● If REP fails, REQ retries after a timeout
● After N retries REQ abandons
Pro: Easy to implement, easy to understand
Cons: REP is still a single point of failure
12. Integrated Computer Solutions Inc. www.ics.com
Simple Pirate Pattern
12
Request-Reply with retry and load
distribution
● A proxy is added to hold multiple
servers/workers
● Proxy distributes client load to an available
server
Pro: Added a failover system to Lazy Pirate
Cons: Proxy is now the single point of failure,
proxy doesn’t know of hanging/dead workers
13. Integrated Computer Solutions Inc. www.ics.com
Paranoid Pirate Pattern
13
Request-Reply with retry, load distribution
and worker management
● A heartbeat is added to the proxy to monitor
worker status
Pro: improved worker management
Cons: same as Simple Pirate
15. Integrated Computer Solutions Inc. www.ics.com
How Do We Find Other Nodes?
15
Pre-Emptive discovery:
● Use ping
● No need for node cooperation
● Need root privileges
● One node per IP address
Cooperative discovery:
● Use UDP
● Needs node cooperation
● No root privileges required
● Multiple node per IP address using
UUIDs
16. Integrated Computer Solutions Inc. www.ics.com
How Do We Know if a Node is Still Up?
16
UDP Heartbeat
Under high loads UDP may never reach the target, how do we ensure that
we don’t remove a node that is still up?
We can use a TCP Heartbeat once multiple UDP fails, if the TCP fails too,
the node is unreachable
17. Integrated Computer Solutions Inc. www.ics.com
How Do I Connect to All Other Nodes?
17
● A combination of PUB-SUB
● A combination of REQ-REP or DEALER-ROUTER
Zyre uses a combinations of one receiving ROUTER socket and multiple
sending DEALER sockets in what was called, the Harmony pattern
18. Integrated Computer Solutions Inc. www.ics.com
The Harmony Pattern
18
The high-level overview
● One ROUTER socket that we bind to a ephemeral port, which we
broadcast in our beacons.
● One DEALER socket per peer that we connect to the peer's ROUTER
socket.
● Reading from our ROUTER socket.
● Writing to the peer's DEALER socket.
19. Integrated Computer Solutions Inc. www.ics.com
Thank you!
19
All examples can be found at:
https://github.com/bricke/0mq-patterns
Any questions?