2. What’s a “distributed system”?
You know you have a distributed system when the crash of a computer you’ve never heard of
stops you from getting any work done. —LESLIE LAMPORT
3. Your mission, should you choose to accept it:
• Read data from one “place”
• Write it to another “place”
5. System Event Actual Latency Scaled Latency
One CPU cycle 0.4 ns 1 s
Level 1 cache access 0.9 ns 2 s
Level 2 cache access 2.8 ns 7 s
Level 3 cache access 28 ns 1 min
Main memory access (DDR DIMM) ~100 ns 4 min
Intel® Optane™ DC persistent memory access ~350 ns 15 min
Intel® Optane™ DC SSD I/O <10 μs 7 hrs
NVMe SSD I/O ~25 μs 17 hrs
SSD I/O 50–150 μs 1.5–4 days
Rotational disk I/O 1–10 ms 1–9 months
Internet call: San Francisco to New York City 65 ms 5 years
Internet call: San Francisco to Hong Kong 141 ms 11 years
Systems Performance: Enterprise and the Cloud, Brendan
6. System Event Actual Latency Scaled Latency
One CPU cycle 0.4 ns 1 s
Level 1 cache access 0.9 ns 2 s
Level 2 cache access 2.8 ns 7 s
Level 3 cache access 28 ns 1 min
Main memory access (DDR DIMM) ~100 ns 4 min
Intel® Optane™ DC persistent memory access ~350 ns 15 min
Intel® Optane™ DC SSD I/O <10 μs 7 hrs
NVMe SSD I/O ~25 μs 17 hrs
SSD I/O 50–150 μs 1.5–4 days
Rotational disk I/O 1–10 ms 1–9 months
Internet call: San Francisco to New York City 65 ms 5 years
Internet call: San Francisco to Hong Kong 141 ms 11 years
Systems Performance: Enterprise and the Cloud, Brendan
7. System Event Actual Latency Scaled Latency
One CPU cycle 0.4 ns 1 s
Level 1 cache access 0.9 ns 2 s
Level 2 cache access 2.8 ns 7 s
Level 3 cache access 28 ns 1 min
Main memory access (DDR DIMM) ~100 ns 4 min
Intel® Optane™ DC persistent memory access ~350 ns 15 min
Intel® Optane™ DC SSD I/O <10 μs 7 hrs
NVMe SSD I/O ~25 μs 17 hrs
SSD I/O 50–150 μs 1.5–4 days
Rotational disk I/O 1–10 ms 1–9 months
Internet call: San Francisco to New York City 65 ms 5 years
Internet call: San Francisco to Hong Kong 141 ms 11 years
Systems Performance: Enterprise and the Cloud, Brendan
mov eax, [ebx]
mov [ecx],eax
(try
(let [[partitioner msg] (cha
(kp/send-message @pr
(kp/message topic (.getBytes
partitioner) (.getBytes ^String
11. System Event Actual Latency Scaled Latency
One CPU cycle 0.4 ns 1 s
Level 1 cache access 0.9 ns 2 s
Level 2 cache access 2.8 ns 7 s
Level 3 cache access 28 ns 1 min
Main memory access (DDR DIMM) ~100 ns 4 min
Intel® Optane™ DC persistent memory access ~350 ns 15 min
Intel® Optane™ DC SSD I/O <10 μs 7 hrs
NVMe SSD I/O ~25 μs 17 hrs
SSD I/O 50–150 μs 1.5–4 days
Rotational disk I/O 1–10 ms 1–9 months
Internet call: San Francisco to New York City 65 ms 5 years
Internet call: San Francisco to Hong Kong 141 ms 11 years
Systems Performance: Enterprise and the Cloud, Brendan
32. • Don’t take distributed
actions lightly
• Be careful when using
abstractions that hide
distributed calls
• Big data means low-
probability problems are
daily occurances
33. Read more
• Fallacies of distributed computing
• Vector clocks
• CRDTs - https://www.serverless.com/blog/crdt-explained-
supercharge-serverless-at-edge
• https://bartoszsypytkowski.com/the-state-of-a-state-based-crdts/
• Google Spanner
https://static.googleusercontent.com/media/research.google.com/en
//archive/spanner-osdi2012.pdf
• https://research.google/pubs/pub45855/
Editor's Notes
8 fallacies
Formulated by Peter Deutsch and James Gosling (fater of Java) in 1994-97
SKB – Linux socket buffer (fundamental structure that handles any packet sent or received )
[31334587.454365] xennet: skb rides the rocket: 21 slots[31334772.157791] xennet: skb rides the rocket: 20 slots[31335254.431489] xennet: skb rides the rocket: 19 slots
http://vger.kernel.org/~davem/skb.html
Anyway - not just the infrastructure, there’s also other things that can affect reliability like ddos attaches
Switches have MTBF 50K hours, (just told Yaniv Erickson achieved nine 9s availability with their AXD301 switch)
Aggrevated by Microservices
99.9930 = 99.7% uptime0.3% of 1 billion requests = 3,000,000 failures2+ hours downtime/month even if all dependencies have excellent uptime.
Retry, Circuit breakers, caching , alert
Bandwidth keeps getting better and better – but latencies don’t , the light ahs a fast but finite speed ping from Europe to US and back is 30ms even if eveything is perfect
We’ve seen the numbers
Bandwidth gets higher – but we also send much more data
Generally we can get the bandwidth -> but it comes with $cost, so actually we need to keep in mind that we’d have to work with limitations
I don’t think that anyone is really likely to make this false assumption these days
We all know we need to deal with security – but are we doing enough? (checkmarx, whitesmoke)
But we’re jjust starting to move service-to-service to SSL, Kafka , spark still TBD)
The reqs fof K8s security since the time I set up AKS to now changed significantly
Build, runtime (kubei)
Same as the previous one – not likely to believe that
That’s why we’re using configuration, discovery and such
The fact is no single person understands all aspects of the system
Devops culture - > passing some responsibility to dev (you build it you own it)
Monitoring – who is going to wake up?
Again config
Opex –
But more than that , serialization, encryption, …
Even my home has IOS, MacOS, Windows, Android (phones, streamer), Printer (embedded), SmartTVs
We’re *mostly* C#
We have “BIG DATA” technologies we can *just* add more instances
Audit – runnning on Hadoop so namenodes so zookeeper
TCO - think operational complexity
Choose the right tool for the job – if it is fit in memory don’t use needless techologies . I’ve answered countless times on Stackoverflow ”Why spark is slow”
Doing things during the pipeline vs. adding machines to deal with queries
Cattle not pets
Bandwidth keeps getting better and better – but latencies don’t , the light ahs a fast but finite speed ping from Europe to US and back is 30ms even if eveything is perfect
We’ve seen the numbers
No ordering !
Time
Clock drift
Getting
NTP / PTP (Precision Time Protocol)
TrueTime
Leslie Lamport is a famous distributed computing researcher
Suppose that event A occurs in a data center, and then later event B.
Did A “cause” B to happen?
What if A was at 10am, and B at 11:30pm. Does knowing time help?
What if A was a command to register a new student, and B was an internal action that creates her “meal card” account?
What if A was an email from the department asking me about my teaching preferences, and B was my reply?
For Leslie, event A causes event B if there was a computation that somehow was triggered by A, and B was part of it. Inspired by physics!
But this is hard to discover automatically.
Instead, Leslie focused on potential causality: A “might” have caused B.
Under what conditions is this possible?
Somehow, information must flow from A to B.
Let’s use LogicalClock(X) to denote the relevant LogicalClock value for x. We can time-stamp events and messages.
If A B, then LogicalClock(A) < LogicalClock (B)
But… if LogicalClock (A) < LogicalClock (B), perhaps A didn’t happen before B!
Can overcome that if we use VectorClock
Conflict Free Replicated data type
No meaning for ordering
Can be a base implementation for logical clocks (and vector clocks)
Growing only – (always increasing)
Can handle multi invocation
Still a problem around “zero” (ordering) -> effectively it is only a constructro
Any idea why we’d want 2 counters ?
The max operation will not work with single counter – we can’t handle duplicate messages
Allowing max values
Need causal ordering (remove => add => remove != remove => remove => add)
2 sets
Need ordering
Will also need 2 sets to support removes