Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency in Reactive Systems

•Download as PPTX, PDF•

10 likes•5,840 views

Part 2: What you should know about Elasticity, Scalability and Location Transparency in Reactive systems In the second of three webinars with live Q/A, we look into how organizations with Reactive systems are able to adaptively scale in an elastic, infrastructure-efficient way, and the role that location transparency plays in distributed Reactive systems. Reactive Streams contributor and deputy CTO at Typesafe, Inc., Viktor Klang reviews what you should know about: How Reactive systems enable near-linear scalability in order to increase performance proportionally to the allocation of resources, avoiding the constraints of bottlenecks or synchronization points within the system How elasticity builds upon scalability in Reactive systems to automatically adjust the throughput of varying demand when resources are added or removed proportionally and dynamically at runtime. The role of location transparency in distributed computing (in systems running on a single node or on a cluster) and how of decoupling runtime instances from their references can embrace network constraints like partial failure, network splits, dropped messages and more. In the third and final webinar in the series with Jonas Bonér, we go over resiliency, failures vs errors, isolation (and containment), delegation and replication in Reactive systems.

Software

Elasticity, Scalability &
Location Transparency
in Reactive Systems
√
Deputy CTO
@viktorklang

1. Lead-In
2. Scale-Up
3. Scale-Out
4. Show-
Down
2

1. Lead-In
2. Scale-Up
3. Scale-Out
4. Show-
Down
3

5
Yesterday Today
Single machines Clusters of machines
Single core processors Multicore processors
Expensive RAM Cheap RAM
Expensive disk Cheap disk
Slow networks Fast networks
Few concurrent users Lots of concurrent users
Small data sets Large data sets
Latency in seconds Latency in milliseconds

11
E l a s t i c i t y
«Lagom is a Swedish word,
meaning "just the right amount"»
— Wikipedia

15
“A service is said to be scalable if when we increase
the resources in a system, it results in increased
performance in a manner proportional to resources
added.”
- Werner Vogels

1. Lead-In
2. Scale-Up
3. Scale-Out
4. Show-
Down
17

Common points of
ApplicationPhysical
contention

28
Single Writer Principle
IO
device
Producer
s
CONTENDED
IO
device
Producer
s Writer
UNCONTENDE
D

30
Needs to be async and non-blocking
a l l t h e w a y d o w n

31
Universal Scalability Law
«N is the number of users;
or the number of CPUs,
α is the contention level,
β the coherency latency.
C is the relative capacity»

The Role of Immutable State
• Great to represent facts
• Messages and Events
• Database snapshots
• Representing the succession of time
• Mutable State is ok if local and contained
• Allows Single-threaded processing
• Allows single writer principle
• Feels more natural
• Publish the results to the world as Immutable State
36

1. Lead-In
2. Scale-Up
3. Scale-Out
4. Show-Down
38

• Mobile / IoT
• HTTP and Micro
Services
• “NoSQL” DBs
• Big Data
• Fast Data
40
Distributed Computing is the
new normal

Reality check
• separation in space & time gives us
• communication for coordination
• variable delays
• partial failures
• partial/local/stale knowledge
41

Cluster/Rack/Datacenter
Cluster/Rack/Datacenter
Cluster/Rack/Datacenter
Middleware
Node Node Node
42
Node

43
1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. Topology doesn't change
6. There is one administrator
7. Transport cost is zero
8. The network is homogeneous
Peter Deutsch’s
8 Fallacies of
Distributed Computing

47
Linearizability
“Under linearizable consistency, all operations appear to
have executed atomically in an order that is consistent
with the global real-time ordering of operations.”
- Herlihy & Wing 1991

(Coordination in the Cluster)
Minimize
Contention

53
“In general, application developers simply do
not implement large scalable applications
assuming distributed transactions.”
- Pat Helland
Life beyond Distributed Transactions:
an Apostate’s Opinion

The Event Log
• Append-Only Logging
• Database of Facts
• Two models:
• One single Event Log
• Strong Consistency
• Multiple sharded Event Logs
• Strong + Eventual Consistency
56

1. Lead-In
2. Scale-Up
3. Scale-Out
4. Show-
Down
57

Data
Center
61
Data
Center
ClusterCluster
MachineMachine
JVMJVM
NodeNode
ThreadThread
CPUCPU
CPU
Socket
CPU
Socket
CPU
Core
CPU
Core
CPU
L1/L2
Cache
CPU
L1/L2
Cache

62
Scaling Up / Out is essentially
the same thing

Elasticity requires a
message-driven
architecture

Summary
• Isolate & Contain + Distribute & Replicate
• Single Purpose Components
• Communicate asynchronously
• Divide & Conquer
• Avoid coordination & minimize contention
• Embrace inconsistency
• Strive for lagom amount of utilisation
64

EXPERT TRAINING
Delivered on-site for Akka, Spark, Scala and Play
Help is just a click away. Get in touch
with Typesafe about our training courses.
• Intro Workshop to Apache Spark
• Fast Track & Advanced Scala
• Fast Track to Akka with Java or
Scala
• Fast Track to Play with Java or
Scala
• Advanced Akka with Java or Scala
Ask us about local trainings available by
24 Typesafe partners in 14 countries
around the world.
CONTACT US Learn more about on-site training

Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency in Reactive Systems

What's hot

Benefits Of The Actor Model For Cloud Computing: A Pragmatic Overview For Jav...Lightbend

Reactive Design PatternsLegacy Typesafe (now Lightbend)

Cloud Native Patterns Using AWS - Practical ExamplesAnderson Carvalho

The Future of Services: Building Asynchronous, Resilient and Elastic SystemsLightbend

Distributed architecture in a cloud native microservices ecosystemZhenzhong Xu

What is Reactive programming?Kevin Webber

Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz

20 mins to Faking the DevOps Unicorn by Matt williams, DatadogDocker, Inc.

Cloudstate - Towards Stateful ServerlessLightbend

Orchestrating stateful applications with PKS and PortworxVMware Tanzu

Designing apps for resiliencyMasashi Narumoto

PKS - Solving Complexity for Modern Data Workloads Carlos Andrés García

Kafka Summit SF 2017 - Running Kafka for Maximum Painconfluent

Cloud Native Patterns Using AWSAnderson Carvalho

Microservices, Kubernetes, and Application Modernization Done RightLightbend

High-Speed Reactive MicroservicesRick Hightower

How Events Are Reshaping Modern SystemsJonas Bonér

War Stories: DIY Kafkaconfluent

How Netflix does Microservices Manuel Correa

Build Robust Blockchain Services with Hyperledger and ContainersLinuxCon ContainerCon CloudOpen China

What's hot (20)

Benefits Of The Actor Model For Cloud Computing: A Pragmatic Overview For Jav...

Reactive Design Patterns

Cloud Native Patterns Using AWS - Practical Examples

The Future of Services: Building Asynchronous, Resilient and Elastic Systems

Distributed architecture in a cloud native microservices ecosystem

What is Reactive programming?

Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark

20 mins to Faking the DevOps Unicorn by Matt williams, Datadog

Cloudstate - Towards Stateful Serverless

Orchestrating stateful applications with PKS and Portworx

Designing apps for resiliency

PKS - Solving Complexity for Modern Data Workloads

Kafka Summit SF 2017 - Running Kafka for Maximum Pain

Cloud Native Patterns Using AWS

Microservices, Kubernetes, and Application Modernization Done Right

High-Speed Reactive Microservices

How Events Are Reshaping Modern Systems

War Stories: DIY Kafka

How Netflix does Microservices

Build Robust Blockchain Services with Hyperledger and Containers

Viewers also liked

process managementAshish Kumar

Distributed System ManagementIbrahim Amer

Chap 4suks_87

Consistency Models in New Generation Databasesiammutex

Consistency in Distributed SystemsShane Johnson

The elements of scaleFastly

Scaling ScribdTimothy Wee

Distributed systems and consistencyseldo

Scaling up food safety information transparencyNikos Manouselis

3. challengesAbDul ThaYyal

3. distributed file system requirementsAbDul ThaYyal

Client-centric Consistency ModelsEnsar Basri Kahveci

message passingAshish Kumar

Distributed shared memory shyam soniShyam Soni

Transparency - The Double-Edged SwordAcando Consulting

distributed shared memoryAshish Kumar

Distributed & parallel systemManish Singh

Distributed SystemsRupsee

Unit 1 architecture of distributed systemskaran2190

Viewers also liked (19)

process management

Distributed System Management

Chap 4

Consistency Models in New Generation Databases

Consistency in Distributed Systems

The elements of scale

Scaling Scribd

Distributed systems and consistency

Scaling up food safety information transparency

3. challenges

3. distributed file system requirements

Client-centric Consistency Models

message passing

Distributed shared memory shyam soni

Transparency - The Double-Edged Sword

distributed shared memory

Distributed & parallel system

Distributed Systems

Unit 1 architecture of distributed systems

Similar to Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency in Reactive Systems

Reactive Supply To Changing DemandJonas Bonér

SDN at schuberg philisHugo Trippaers

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...Dmitry Afanasiev

IntroductionMohamed Diallo

Managing Dynamic Shared stateAditya Gupta

Thu 430pm solarflare_tolley_v1[1]Bruce Tolley

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...Yandex

Linaro connect 2018 keynote final updatedDileep Bhandarkar

HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro

Distributed systems and scalability rulesOleg Tsal-Tsalko

Sync in an NFV World (Ram, ITSF 2016)Adam Paterson

Sync in an NFV World (Ram, ITSF 2016)Calnex Solutions

SDN Security Talk - (ISC)2_3Wen-Pai Lu

EOUG95 - Client Server Very Large Databases - PaperDavid Walker

chap-0 .pptLookly Sam

IEEE HPSR 2017 Keynote: Softwarized Dataplanes and the P^3 trade-offs: Progra...Christian Esteve Rothenberg

VeriFlow PresentationKrystle Bates

Sdn not just a buzzwordJorge Bonilla

Simple Solutions for Complex ProblemsTyler Treat

Simple Solutions for Complex Problems Apcera

Similar to Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency in Reactive Systems (20)

Reactive Supply To Changing Demand

SDN at schuberg philis

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...

Introduction

Managing Dynamic Shared state

Thu 430pm solarflare_tolley_v1[1]

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...

Linaro connect 2018 keynote final updated

HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...

Distributed systems and scalability rules

Sync in an NFV World (Ram, ITSF 2016)

SDN Security Talk - (ISC)2_3

EOUG95 - Client Server Very Large Databases - Paper

chap-0 .ppt

IEEE HPSR 2017 Keynote: Softwarized Dataplanes and the P^3 trade-offs: Progra...

VeriFlow Presentation

Sdn not just a buzzword

Simple Solutions for Complex Problems

Recently uploaded

Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki

Patterns for automating API delivery. API conferencessuser9e7c64

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl

CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies

Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López

Large Language Models for Test Case Evolution and RepairLionel Briand

Sending Calendar Invites on SES and Calendarsnack.pdf31events.com

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort

SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky

Understanding Flamingo - DeepMind's VLM Architecturerahul_net

A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska

Powering Real-Time Decisions with Continuous Data StreamsSafe Software

Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley

Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden

VK Business Profile - provides IT solutions and Web Developmentvyaparkranti

Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171

Recently uploaded (20)

Machine Learning Software Engineering Patterns and Their Engineering

Patterns for automating API delivery. API conference

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany

CRM Contender Series: HubSpot vs. Salesforce

Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...

Large Language Models for Test Case Evolution and Repair

Sending Calendar Invites on SES and Calendarsnack.pdf

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)

SpotFlow: Tracking Method Calls and States at Runtime

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...

Understanding Flamingo - DeepMind's VLM Architecture

A healthy diet for your Java application Devoxx France.pdf

Powering Real-Time Decisions with Continuous Data Streams

Comparing Linux OS Image Update Models - EOSS 2024.pdf

Simplifying Microservices & Apps - The art of effortless development - Meetup...

VK Business Profile - provides IT solutions and Web Development

Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf

Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency in Reactive Systems

1. Elasticity, Scalability & Location Transparency in Reactive Systems √ Deputy CTO @viktorklang

2. 1. Lead-In 2. Scale-Up 3. Scale-Out 4. Show- Down 2

3. 1. Lead-In 2. Scale-Up 3. Scale-Out 4. Show- Down 3

4. The rules of the game have changed

5. 5 Yesterday Today Single machines Clusters of machines Single core processors Multicore processors Expensive RAM Cheap RAM Expensive disk Cheap disk Slow networks Fast networks Few concurrent users Lots of concurrent users Small data sets Large data sets Latency in seconds Latency in milliseconds

6. Tomorrow

7. The Principles of Reactive Systems

8. Scale on Demand? Why do we need to

9. 11 E l a s t i c i t y «Lagom is a Swedish word, meaning "just the right amount"» — Wikipedia

10. scalability? what is But

11. 15 “A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added.” - Werner Vogels

12. vs Scalability Performance

13. 1. Lead-In 2. Scale-Up 3. Scale-Out 4. Show- Down 17

14. UP Scale and down

15. 19 Modern CPU architecture

16. The CPU is a notorious gambler 20

17. Maximize Locality of Reference

18. Minimize Contention

19. Common points of ApplicationPhysical contention

20. Block 24 Never ever

21. Async 25 BE

22. 26 NOTHING share

23. DIVIDE & conquer 27

24. 28 Single Writer Principle IO device Producer s CONTENDED IO device Producer s Writer UNCONTENDE D

25. 29

26. 30 Needs to be async and non-blocking a l l t h e w a y d o w n

27. 31 Universal Scalability Law «N is the number of users; or the number of CPUs, α is the contention level, β the coherency latency. C is the relative capacity»

28. Perfect 32 Throughput Load

29. Imperfect 33 Throughput Load

30. Bounded 34 Throughput Load

31. Regressive 35 Throughput Load

32. The Role of Immutable State • Great to represent facts • Messages and Events • Database snapshots • Representing the succession of time • Mutable State is ok if local and contained • Allows Single-threaded processing • Allows single writer principle • Feels more natural • Publish the results to the world as Immutable State 36

33. on Demand Scale

34. 1. Lead-In 2. Scale-Up 3. Scale-Out 4. Show-Down 38

35. OUT Scale (and IN)

36. • Mobile / IoT • HTTP and Micro Services • “NoSQL” DBs • Big Data • Fast Data 40 Distributed Computing is the new normal

37. Reality check • separation in space & time gives us • communication for coordination • variable delays • partial failures • partial/local/stale knowledge 41

38. Cluster/Rack/Datacenter Cluster/Rack/Datacenter Cluster/Rack/Datacenter Middleware Node Node Node 42 Node

39. 43 1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous Peter Deutsch’s 8 Fallacies of Distributed Computing

40. Maximize Locality of Reference

41. Strong Consistency

42. 47 Linearizability “Under linearizable consistency, all operations appear to have executed atomically in an order that is consistent with the global real-time ordering of operations.” - Herlihy & Wing 1991

43. Strong Consistency Protocols

44. (Coordination in the Cluster) Minimize Contention

45. 50 CAP Theorem

46. Consistency Eventual

47. 52 CRDT CvRDTs/CmRDTs

48. 53 “In general, application developers simply do not implement large scalable applications assuming distributed transactions.” - Pat Helland Life beyond Distributed Transactions: an Apostate’s Opinion

49. The Event Log • Append-Only Logging • Database of Facts • Two models: • One single Event Log • Strong Consistency • Multiple sharded Event Logs • Strong + Eventual Consistency 56

50. 1. Lead-In 2. Scale-Up 3. Scale-Out 4. Show- Down 57

51. NOTHING 58 share

52. TRANSPARENCY 59 location

53. Data Center 61 Data Center ClusterCluster MachineMachine JVMJVM NodeNode ThreadThread CPUCPU CPU Socket CPU Socket CPU Core CPU Core CPU L1/L2 Cache CPU L1/L2 Cache

54. 62 Scaling Up / Out is essentially the same thing

55. Elasticity requires a message-driven architecture

56. Summary • Isolate & Contain + Distribute & Replicate • Single Purpose Components • Communicate asynchronously • Divide & Conquer • Avoid coordination & minimize contention • Embrace inconsistency • Strive for lagom amount of utilisation 64

57. EXPERT TRAINING Delivered on-site for Akka, Spark, Scala and Play Help is just a click away. Get in touch with Typesafe about our training courses. • Intro Workshop to Apache Spark • Fast Track & Advanced Scala • Fast Track to Akka with Java or Scala • Fast Track to Play with Java or Scala • Advanced Akka with Java or Scala Ask us about local trainings available by 24 Typesafe partners in 14 countries around the world. CONTACT US Learn more about on-site training

Editor's Notes

In Part 2, we look into how organizations with Reactive systems are able to adaptively scale in an elastic, infrastructure-efficient way, and the role that location transparency plays in distributed Reactive systems. Reactive Streams contributor and deputy CTO at Typesafe, Inc., Viktor Klang reviews what you should know about: How Reactive systems enable near-linear scalability in order to increase performance proportionally to the allocation of resources, avoiding the constraints of bottlenecks or synchronization points within the system How elasticity builds upon scalability in Reactive systems to automatically adjust the throughput of varying demand when resources are added or removed proportionally and dynamically at runtime. The role of location transparency in distributed computing (in systems running on a single node or on a cluster) and how of decoupling runtime instances from their references can embrace network constraints like partial failure, network splits, dropped messages and more.
Scalability is something that I’m very passionate about. Remember being very fascinated by distributed systems in the first courses at the university. Guilty of doing CORBA, EJBs, RMI, XA etc. Learned a lot the hard way—through agony and pain. Talk: mixed bag of things that what works and doesn’t work—from my point of view. This really hard stuff. But a few good principles & practices can make all the difference.
Let’s go back in history and see what have changed. Since the rules of the game have changed—fundamentally. Not everyone might be aware of it.
Clusters: We have a dist system from day one. With all its challenges and possibilities. Very different world. Multicore: Mutable state used to be ok (von Neumann arch etc.). Today we need better tools, and threads/locks won’t cut it. RAM: Opens up for in-memory DB and caching, have the whole data set in memory. Disk: No reason to ever delete data—like RDBMS in-place updates. Now we can keep all data around forever. Full history. Network: Faster to write to network than to disk. Opens up for new efficient replication strategies. Lots of users: Today most apps are put on the Internet with a massive potential user base. Data: Massive amounts of data needs to be moved around, analyzed and stored Latency: Users today are extremely impatient.
…and just around the corner we can expect: Billions of devices all connected — Internet of Things Smart cars, health monitors, smart homes, phones GSM Association predicts: 24 billion devices by 2020 Others think it can be twice that: 50 billion Computers will be running 100s or 1000s or perhaps even 100s of thousands of cores Need a different designs and different tools. Reactive apps THE answer on the server side
Example: 1980: Cray2 was considered a supercomputer (and very expensive) 2014: iPhone has more computing power (but really cheap) Cost Gravity (Pieter Hintjens): Generalization of Moore’s Law Technology is getting More and more advanced At a cheaper and cheaper price Exponentially Extremely exciting, but also terrifying
responsive: react to users The goal for any app should be that it is responsive—at all times: not just under blue skies under load & spikes—planned or unplanned under failure Responsiveness means that problems may be detected quickly and dealt with effectively Responsive systems focus on providing rapid and consistent response times Establishing reliable upper bounds so they deliver a consistent quality of service The system stays responsive in the face of failure. => resilient: react to failure Resilient means: to spring back into shape, not just being fault-tolerant often bolted on after using the wrong tools, part of design from day 1, natural state in lifecycle, manage failure isolation/containment avoid cascading failures repair/heal themselves The system stays responsive under varying workload => elastic: react to load, scale on demand React to changes in the input rate by increasing or decreasing the resources allocated to service these inputs. Need designs with no contention points or central bottlenecks => ability to shard or replicate components and distribute inputs among them. Support predictive and adaptive scaling algorithms Cost-effective use of commodity hardware message-driven: react to messages async, non-blocking, efficient, lazy, push not pull async boundary => loose coupling isolation/containment + reify errors as messages location transparency = same model and semantics everywhere explicit MP enables: load management, elasticity flow control, back pressure brings all the other traits together
A scalable application is able to be expanded according to its usage. Need to react to increased load Be adaptive and elastic Be able to scale up/down and out/in on demand. Scale on demand Rapid growth—popularity Unpredictable spikes and usage patternsor planned Benefits for businesses Changing business requirements Pay for what you use Cuts costs and minimizes risk of having too much hardware idling too little hardware (loose sleep)
Elastic means being able to: scale on demand scale up and down Scalability is an enabler for Elasticity.
Viktor’s comment: He seems to confuse performance and scalability
My definition: Performance is the capability of a system to provide a certain response time Scalability is the capability of a system to maintain that response time as more resources are added to deal with increasing load. Performance it tangled with three other characteristics: Latency Throughput Scalability Many different views and definitions.
We need to utilize multicore architectures efficiently.
Memory Management in modern CPUs is very advanced Cache coherence and invalidation protocols Prefetching, branch speculation etc. Hierarchical caches: L1, L2, L3 Haswell processor (in the image): Cores 2–4, 8—Each core has a: Local L1 cache 64 KB Local L2 cache 256 KB Shared L3 cache 2 MB to 8 MB With increasingly more latency (NEXT) So most caches are local Same with NUMA—Non-Uniform Memory Access Image of ccNUMA (Cache Coherent NUMA) Cheap to access local memory on your socket But very expensive across sockets Roundtrip between sockets is 40 nanoseconds Today CPUs are so efficientnormally have to stall, waiting for data So access to local data is fast Affects how we think about & design software
CPU doesn’t rely on plain luck—to beat the system Like Raymond in Rain Man. Ask it and you get the same reply: “We’re counting cards, counting cards…” It takes three bets Temporal: using regular caching, LRU Spatial: things close, are likely to be used together Pattern: Prefetching that detects patterns in the codeIterating over an Array—vs a Linked List Also does Branch Speculation Can sound complicated and involved But the good news is… Clean code matters Short methods Single Responsibility Principle, Compose well Simple logic with little branching Things used together are put together: No Feature Envy No clever stuff Share nothing matters Local state stays local Copy state and ship it offinstead of sharing and introducing contention
If you think of how modern CPUs work What really matters is to maximize Locality of Reference. I.e. locality of data Keep data close to its processing context Minimize cache invalidations How? No shared mutable state Co-locate data: Ensure they are on the same cache line. Ideally pin threads to cores—not possible in Java Single Writer Principle Append Only Logs Smart Batching etc.
Contention is the primary enemy to scalability
So, where is this bastard most likely to show up? Physical contention points CPU Memory Network IO File IO Database IO Application contention points Primitives synchronized blocks, Locks, Barriers, Latches Optimistic lock-free concurrencyCAS loops—contention can make it hard to make progressOveruse of volatile variables—contention on the memory bus Data structures Shared concurrent data structures Persistent data structures Tree—Structural sharing—repointing of root node Algorithms Join points scatter-gather map-reduce fork-join
So how should we address contention? Never. Ever. Block. Putting threads to sleep when blocking incurs a high wake-up cost Roughly 650 ns (on Haswell MBP15) Can run out of threads if blocking If you need to block Don’t use a single threaded runtime (Node) Use sandboxing (protected regions) Managed blocking—hint to thread pool to allocate new threads Instead use: Lock-free concurrency: Optimistic CAS-based Async message passing (next slide)
Build on an Message-driven core Use Async Message Passing Concurrent by design: Concurrency becomes workflow Just like humans work and communicate Allows you to model the real world (non-determinism) Allows loosely coupled systems Easier to: write, understand, maintain, evolve Async systems Initial hit of essential complexity, but.. Low accidental complexity Complexity stays constant Compare to synchronous systems Lower initial essential complexity (familiar) High accidental complexity Out of the box tools: Explicit Queues, MPI Actors (Akka/Erlang) Reactive Streams (Rx, Akka Streams) Future composition
The simplest way to scale up on multicore is to fully embrace Share Nothing Architecture Async message passing It gives you: Great Locality of Reference Minimized Contention Since you have zero shared state Uncontended local state Independent processes communicating using values
So how should we design our algorithms? Look at how old-timer winners like Ceasar did it: Divide and Conquer Split up the work in small discrete independent tasks Ideally Embarrassingly Parallell No dependencies or coupling
Sequential IO writes are fast No contention Single threading can be your friend Append Only Logging is a great tool (talk about later in context of CQRS) Smart Batching pattern (Martin Thompson)
THEN Use pipelining—stages with messages flowing between 2 types: Can be synchronous Can be asynchronous Usually a combination Ideally run on a single thread No cache invalidations and copying of memory Minimized contention Can not block or the pipeline stalls Single threaded pipelines are all good, IF You can max out on your CPU If not, introduce async stages—to increase parallelization. Need to have build in back pressure and flow control Ideally done by the library: Akka Streams optimisation through stream fusion Tools: SEDA, Actors Disruptor, CSP Futures or Reactive Streams
Contention: waiting or queueing for shared resources Coherency: delay for data to become consistent Amdahl's Law: - EFFECT contention has on PARALLEL system - CONTENTION gives DIMINISHING returns Universal Scalability Law: - ADDS Coherency INCOHERENCY can give NEGATIVE results Coherency == 0 => Amdahl’s Law
The 3 C’s:ConcurrencyContention CoherencyBeta = 0 == Amdahl’s Law
To quote my dear friend The Legend of Klang….(NEXT) As we all know, Immutability has immense value Stable values, code that you can trust etc. Lots of talking about immutable state and its role in building concurrent scalable systems (NEXT) On a more serious level… Great to represent Facts Things that have happened Values Events Database snapshots Less ideal for a “working” data set Persistent data structures can increase contention Uses structural sharing with repointing on updates Contention at the root node Instead use a Share Nothing Architecture with mutable state within each isolated processing unit and immutable state sent between—events
But to truly scale on demand We need to Scale OUT We need Elasticity We need to be able to add processing powerand a single node can’t give us that.
We need elasticity and efficient utilization of cluster and cloud computing architectures
Distribute systems in the new NORMAL. We have it either we want it or not… Deal with it.
Alright, so do we all agree that in what we call Reality, we have multiple dimensions? What things do we get from that? Comm for Coo: So given that entities do not exist in the same place, it means that they need to communicate if they want to coordinate -anything-. Delays: Ever observed a race-condition that as you tried to fix it just became less likely? That’s shortened delays—making the window of opportunity smaller but still possible. Partial failures: Since things do not exist in the same location, they especially if collaborating on something, will risk failing individually—where one succeeded and one failed, for example. Knowledge: Since communication is how we coordinate, it is also how we coordinate -information-, and since we have delays and partial failures, we will only ever have a subjective view of the world, one that is bound to be incomplete and stale.
In a distributed system you have isolated machines, nodes, JVMs You can’t possibly share memory Which means that we need to communicate asynchronously using messages Also, there is a network between. which makes communication expensive which is inherently unreliable Does not just apply to nodes, but to Clusters Racks Data centers
Distributed Computing is REALLY HARD. But as we will see, solid principles can make it manageable.
But first let’s pay a visit to my own little graveyard of dist systems. We need to learn from history’s mistakes UNLEARN bad habits … So, what should we do instead?
We talked a lot about data locality Well, it matters even more in a distributed system Even more expensive to: move data around repeatedly ensure integrity of data
But let’s start with some theory. Three models for consistency Strong consistency Eventual consistency Weak consistency (not of much practical use) Strong is defined by Linearizability
Less formally: “A read will return the last completed write (made on any replica)” Very strong (and expensive) guarantees Sometimes needed Minimize the dataset
Strong consistency protocols Viewstamped Replication (Liskov & Cowling 1988) Paxos (Lamport 1989) ZAB—Zookeeper Atomic Broadcast (Reed 2008) Raft (Ongaro & Ousterhout 2013) Partition tolerant (if replicas > N/2-1) Dynamic master High latency Medium throughput These protocols are hard to scale. RDMBS provides strong consistency but are hard to scale In general Strong Consistency is Very Expensive But sometimes needed Minimize the dataset THINK about your data. Different data has different needs in terms of guarantees.
Coordination is the main killer of scalability in a cluster Latency is higher, coordination cost is higher. Coherence cost is higher.
Important discovery: CAP Conjecture by Eric Brewer 2000Proof by Lynch & Gilbert 2002 Consistency, Availability, Partition Tolerance=>Pick 2 Linearizability is impossible under network partitions CA systems do not exist In retrospect: Very influential—but very narrow scope “[CAP] has lead to confusion and misunderstandings regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper Linearizability is very often not required Ignores latency—but in practice latency & partitions are deeply related Partitions are rare—so why sacrifice C or A all the time? Not black and white—can be fine-grained and dynamic Read ‘CAP Twelve Years Later’ - Eric Brewer But amazing work that influenced the NOSQL movement and Eventual Consistency
Eventual consistency—Essentially Minimized Coordination More headroom for Scalability & Availability Definition: The storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value. Popularized by Amazon’s Dynamo What’s behind Amazon’s shopping cart, EC2 and more Epidemic Gossip using Vector Clocks Failure detection Consistent Hashing Influenced: DynamoDB, Riak, Voldemort, Cassandra Most DBs are only Key/Value stores BUT CRDTs provides richer Eventually Consistent Data Types
Great tool for For minimal coordination in the cluster Eventually consistent RICH datatypes Registers, Maps, Sets, Graphs, etc. Need a Monotonic merge function 2 types: CvRDT—convergent—state-based keep all history in the data type—like a vector clock clients can go offline eventually converge as long as all changes eventually reaches all replicas has a garbage collection problem—GC needs full consistency CmRDT—commutative—operations-based send all state-changing operations to all replicas needs a reliable broadcast channel no garbage problem But, HOW can we Scale yet provide Transactional Integrity?
Start by reading this paper, then read it again. Can’t use distributed transactions. So what should we use?
Let’s look at a few building blocks for making this possible. First: Explicitly model state transitions in Domain Events Think in Facts Things that have completed Always Immutable Can’t change the paste Verbs in past tense CustomerRelocated CargoShipped InvoiceSent
Second: Use an Event Log: The Event Log persists Domain Events Can apply the Single Writer Principle Append-Only Logging: AOL Can log to Local Memory Mapped files (ByteBuffers in Java) File based Journals (LevelDB etc.) Replicated Homegrown replicated versions (using Paxos/Raft) Like Greg Young’s EventStore Fully replicated NOSQL DBs backends Or regular SQL DBs Read The Log by Jay Kreps
Stores Facts: have already happened The log is a DB of Facts—immutable Domain Events Knowledge only grows Never delete anything Accountants never delete anything: Keeps in Ledger Can look the perspective of two different models: 1 single event log—Datomic, Oracle TX Log Single fully consistent snapshot of DB Reads are “free” Limited scalability Multiple sharded event logs—Event Sourcing Multiple internally consistent views Aggregate Root is consistency boundary Strong Consistency within AR Eventual Consistency between AR => Joins are eventually consistent Unlimited scalability
By now I hope it is clear that thesimplest way to scale out is to fully embrace Share Nothing Async message passing You have zero shared state => Uncontended local state Independent processes communicating using Values Gives us what we need: Great Locality of Reference Minimized Contention/Coordination If possible—use CRDTs to model shared state
The KEY here is: Location Transparency Should not be underestimated It is not transparent distributed computing Does not violate Waldo’s ‘A Note On Distributed Computing’ But the opposite: Explicit distributed computing Local communication is an optimization Embrace the Network and the essence of it: Locality of data Async message passing This gives you a: One model one thing to learn and understand with one set of semantics regardless if we scale UP or OUT Instead of having to use two completely diff models… Runtime that can optimize communication by improving Locality Communication Adaptive routing protocols—gather metrics and acts
What I’ve tried to highlight in this talk is that You can think of Scalability very much like Escher’s painting Print Gallery Small to large—at every level It is basically the same Small “machines” with local memory Communicating with async messages The same design principles can be used to solve the problem at any level Regarding the video: Animation of Escher’s Print Gallery The original painting had a blank hole in the middle. Left a few questions: What is missing? What is really in this hole? Why did Escher not paint it out? What was the problem? Escher left sketches of how he drew the perspectives—mathematically Can be explained and completed mathematically (Droste effect) Escher had an an incredible mathematical intuition Read more here: http://escherdroste.math.leidenuniv.nl/index.php?menu=intro
If we apply this way of looking at things to systems. It’s all separate “machines” or “units” with local memory communicating with async message passing Embrace this fact. So…
To make it scale on multiple independent processing units all with local memory communicating with async message passing The same challenges and (conceptually) the same solutions The techniques and technologies will vary. But the principles stays the same: Share Nothing Architecture Building on an Message-driven foundation.
Decoupling in Time and Space Location Transparency
In Part 2, we look into how organizations with Reactive systems are able to adaptively scale in an elastic, infrastructure-efficient way, and the role that location transparency plays in distributed Reactive systems. Reactive Streams contributor and deputy CTO at Typesafe, Inc., Viktor Klang reviews what you should know about: How Reactive systems enable near-linear scalability in order to increase performance proportionally to the allocation of resources, avoiding the constraints of bottlenecks or synchronization points within the system How elasticity builds upon scalability in Reactive systems to automatically adjust the throughput of varying demand when resources are added or removed proportionally and dynamically at runtime. The role of location transparency in distributed computing (in systems running on a single node or on a cluster) and how of decoupling runtime instances from their references can embrace network constraints like partial failure, network splits, dropped messages and more.

Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency in Reactive Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency in Reactive Systems

Similar to Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency in Reactive Systems (20)

More from Legacy Typesafe (now Lightbend)

More from Legacy Typesafe (now Lightbend) (13)

Recently uploaded

Recently uploaded (20)

Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency in Reactive Systems

Editor's Notes