2. What is distributed system?
A distributed system is a collection of independent
computers that coordinate their activity and share
resources and appears to its users as a single coherent
system.
3. Why do we need distributed
systems?
• Nature of application required distributed
network/system
• Availability/Reliability (no single point of failure)
• Performance (bunch of commodity servers give
more performance that one supercomputer)
• Cost efficient (bunch of commodity servers cost less
than one supercomputer)
4. Examples
•
•
•
•
•
•
•
•
Telecom networks (telephone/computer networks)
WWW, peer-to-peer networks
Multiplayer online games
Distributed databases
Network file systems
Aircraft control systems
Scientific computing (cluster/grid computing)
Distributed rendering
5. Distributed systems characteristics
Lack of a global clock
Multiple autonomous components
Components are not shared by all users
Resources may not be accessible
Software runs in concurrent processes on different processors
Multiple Points of control (distributed management)
Multiple Points of failure (fault tolerance)
The structure of the system (network topology, network
latency, number of computers) is not known in advance
• Each computer has only a limited, incomplete view of the system.
•
•
•
•
•
•
•
•
6. Advantages over centralized
systems
Scalability
Redundancy
•The system can easily be expanded by adding more machines as needed.
•Several machines can provide the same services, so if one is unavailable, work does not stop.
Economics
•A collection of microprocessors offer a better price/performance than mainframes. Low
price/performance ratio: cost effective way to increase computing power.
Reliability
•If one machine crashes, the system as a whole can still survive.
Speed
Incremental growth
•A distributed system may have more total computing power than a mainframe
•Computing power can be added in small increments
7. Advantages over independent PCs
Data sharing
• Allow many users to access common data
Resource sharing
• Allow shared access to common resources
Communication
• Enhance human-to-human communication
Flexibility
• Spread the workload over the available machines
8. Parallel computing
Distributed computing
•In parallel
computing, all
processors may have
access to a shared
memory to exchange
information between
processors.
•In distributed
computing, each
processor has its own
private memory
(distributed memory).
Information is
exchanged by passing
messages between the
processors.
9. Algorithms
Parallel algorithms in shared-memory model
• All computers have access to a shared memory. The algorithm designer chooses the
program executed by each computer.
Parallel algorithms in message-passing model
• The algorithm designer chooses the structure of the network, as well as the program
executed by each computer.
Distributed algorithms in message-passing model
• The algorithm designer only chooses the computer program. All computers run the
same program. The system must work correctly regardless of the structure of the
network.
10. It appeared that Distributed
Systems have some fundamental
problems!
11. Byzantine fault-tolerance problem
The objective of Byzantine fault tolerance is to be able
to defend against Byzantine failures, in which
components of a system fail in arbitrary ways
Known algorithms can ensure correct operation only if
<1/3 of the processes are faulty.
13. Consensus problem
• Agreeing on the identity of leader
• State-machine replication
• Atomic broadcasts
There are number of protocols to solve consensus problem in distributed
systems such as widely used `Paxos consensus protocol` http://en.wikipedia.org/wiki/Paxos_algorithm
15. Grid computing
Grid computing is the collection of computer
resources from multiple locations to reach a common
goal. What distinguishes grid computing from
conventional high performance computing systems
such as cluster computing is that grids tend to be more
loosely coupled, heterogeneous, and geographically
dispersed.
16. Cluster computing
Computer clustering relies on a centralized
management approach which makes the nodes
available as orchestrated shared servers. It is distinct
from other approaches such as peer to peer or grid
computing which also use many nodes, but with a far
more distributed nature.
17. Distributed systems design and
architecture principles
The art of
simplicity
Scaling out
(X/Y/Z-axis)
Aggressive use of
caching
Using messaging
whenever
possible
Redundancy to
achieve HA
Replication
Sharding
Scaling your
database level
Data locality
Consistency
Fault tolerance
CAP theorem
19. HA nodes configuration
Active/active (Load
balanced)
• Traffic intended for the failed node is either passed onto an
existing node or load balanced across the remaining nodes.
Active/passive
• Provides a fully redundant instance of each node, which is
only brought online when its associated primary node fails:
Hot standby
Warm standby
Cold standby
• Software components are installed and available on both
primary and secondary nodes.
• The software component is installed and available on
the secondary node. The secondary node is up and
running.
• The secondary node acts as backup of another identical
primary system. It will be installed and configured only
when the primary node breaks down for the first time.
20. Redundancy as is
•
•
•
•
•
•
Redundant Web/App Servers
Redundant databases
Disk mirroring
Redundant network
Redundant storage network
Redundant electrical power
21. Redundancy in HA cluster
• Easy start/stop procedures
• Using NAS/SAN shared storage
• App should be able to store it’s state in shared
storage
• App should be able to restart from stored
shared state on another node
• App shouldn’t corrupt data if it crashes or
restarted
22. Replication
Replication in computing involves sharing information so as to
ensure consistency between redundant resources.
• Primary-backup (master-slave) schema – only primary node
processing requests.
• Multi-primary (multi-master) schema – all nodes are
processing requests simultaneously and distribute state
between each other.
Backup differs from replication in that it saves a copy of data
unchanged for a long period of time. Replicas, on the other
hand, undergo frequent updates and quickly lose any
historical state.
23. Replication models
• Transactional replication. Synchronous replication to number of nodes.
• State machine replication. Using state machine based on Paxis
algorithm.
• Virtual synchrony (Performance over fault-tolerance). Sending
asynchronous events to other nodes.
• Synchronous replication (Consistency over Performance) - guarantees
"zero data loss" by the means of atomic write operation.
• Asynchronous replication (Performance over Consistency) (Eventual
consistency) - write is considered complete as soon as local storage
acknowledges it. Remote storage is updated, but probably with a
small lag.
24. Sharding (Partitioning)
Sharding is the process of storing data records across multiple
machines to meet demands of data growth.
Why sharding?
• High query rates can exhaust the CPU capacity of the
server.
• Larger data sets exceed the storage capacity of a single
machine.
• Finally, working set sizes larger than the system’s RAM stress
the I/O capacity of disk drives.
25. Sharding (Partitioning)
• Sharding reduces the number
of operations each shard
handles.
• Sharding reduces the amount
of data that each server needs
to store.
26. Data Partitioning Principles
Partitioned Data
Feeder
Virtual Machine
Virtual Machine
Virtual Machine
Back to key
scenarios
Partitioned Data with Backup Per Partition
Feeder
Replication
Replication
Backup 1
Primary 1
Primary 2
Backup 2
Virtual Machine
Virtual Machine
Virtual Machine
Virtual Machine
® Copyright 2012 Gigaspaces Ltd. All Rights Reserved
26
27. Split-brain problem
When connectivity between nodes in cluster gone
and cluster divided in several parts
Solutions:
• Optimistic approach (Availability over Consistency)
o Leave as is and rely on later resynch (Hazelcast)
• Pessimistic approach (Consistency over Availability)
o Leave only one partition live before connectivity fixed (MongoDB)
28. Consistency
Strong
Weak
Eventual
• After update completes any subsequent access will
return the updated value.
• The system does not guarantee that subsequent
accesses will return the updated value.
• The storage system guarantees that if no new updates
are made to object eventually all accesses will return
the last updated value.
29. Eventually consistent
Strong => W + R > N
Weak/Eventual => W + R <= N
Optimized read => R=1, W=N
Optimized write => W=1, R=N
N – number of nodes
W – number of replicas to aknowledge update
R – number of replicas contacted for read
30. Fault tolerance
(Architecture concepts)
Fault tolerant
system:
Approaches:
• No single point of failure
• Fault isolation
• Roll-back/Roll-forward procedures
• Replication
• Redundancy
• Diversity – several alternative implementations of
some functionality
31. Fault tolerance
(Design principles)
Design using fault isolated “swimlanes”
Never trust single point of failure
Avoid putting systems in series
Ensure you have “switch on/switch off” for your new functionality
32. Data locality
Put data closer to clients scaling by Z-axis.
Locate processing units near data to be processed.
33. BASE
• Basic Availability
• Soft-state
• Eventual consistency
Alternative model to well known ACID which is used in
Distributed Systems to relax strong consistency
constraints in favor to achieve higher Availability
together with Partition Tolerance as per CAP theorem.
36. Eric Brewer’s quote
“Because partitions are rare, CAP should allow perfect C and A most of
the time, but when partitions are present or perceived, a strategy that
detects partitions and explicitly accounts for them is in order. This
strategy should have three steps: detect partitions, enter an explicit
partition mode that can limit some operations, and initiate a recovery
process to restore consistency and compensate for mistakes made
during a partition.”
39. Scaling out (Z/Y/Z axis)
[X-Axis]: Horizontal duplication (design to clone things)
[Y-Axis]: Split by Function, Service or Resource (design to split diff things)
[Z-Axis]: Lookups split (design to split similar things)
40. The art of simplicity
KISS (Keep it simple). Don’t overengineer a solution.
Simplify solution 3 times over (scope, design, implementation)
Reduce DNS lookups. Reduce objects where possible (Google main page)
Use homogenous networks where possible
Avoid too many traffic redirects
Don’t check your work (avoid defensive programing)
Relax temporal constraints where possible
41. Aggressive use of caching
Use expires headers
Cache AJAX calls
Leverage Page Caches (Proxy Web Servers)
Utilize Application caches
Use Object Caches (ORM level)
Put caches in their own tier
43. Using messaging whenever
possible
• Communicate asynchronously as
much as possible
• Ensure your message bus can scale
• Avoid overcrowding your message
bus
44. Scaling your database layer
Denormalize data where possible cause relationships are costly.
Use the right type of lock.
Avoid using multiphase commits and distributed transactions.
Avoid using “select for update” statements.
Don’t select everything.