SlideShare a Scribd company logo
1 of 42
Download to read offline
Distributed Systems

 scalability and high availability




Renato Lucindo - lucindo.github.com - @rlucindo
Renato Lucindo

 Call me Lucindo (or Linus)
 2002 - Bachelor Computer Science
 2007 - M.Sc. Computer Science (Combinatorial
 Optimization)
 7+ year developing Distributed Systems




 My default answer: "I don't know."
Agenda


 Scalability

 High Availability

 Problems

 Tips and Tricks

 Learning More
Distributed Systems

  Multiple computers that interact with each other over a
  network to achieve a common goal
  Purpose
     Scalability
     High availability




                                     source: http://www.cnds.jhu.edu/
Scalability

  System ability to handle gracefully a growing amount of
  work

  Scale up (vertical)
     Add resources to a single node
     Improve existing code to handle more work

  Scale out (horizontal)
     Add more nodes to a system
     Linear (or better) scalability
Scalability - Vertical

  Add: CPU, Memory, Disks (bigger box)
  Handling more simultaneous:
     Connections
     Operations
     Users
  Choose a good I/O and concurrency model
     Non-blocking I/O
     Asynchronous I/O
     Threads (single, pool, per-connection)
     Event handling patterns (Reactor, Proactor, ...)
  Memory model?
     STM
Scalability - Vertical

  Careful with numbers
      Requests per second
      # of Connections
      Simultaneous operations
  Event handling
      Think front-end
      Slow connections/clients
      It's slower than other options
  In doubt, go async
  Back-end
      Thread pool (thread per-connection)
      No events
      Process per-core
Scalability - Horizontal

  Add nodes to handle more work
  Front-end
     Straightforward
     Stateless
  Back-end
     Master/Slave(s)
     Partitioning
         DHT
         Volatile Index
Scalability - Horizontal

  Master/Slave
  Write on single Master
  Read on Slaves (one or more)
  Scales reads
Scalability - Horizontal

  Partitioning (Sharding)
     Distribute dada across nodes
  Generally involves data de-normalization
  Where is some specific data?
     Master Index
     Hash (DTH, Consistent Hashing)
     Volatile Index
  Joins done in application level
  NoSQL friendly
Scalability - Horizontal

  Volatile Index: build and maintain data index as cached
  information (all clients)
High Availability

            "Processes, as well as people, die"


  Handle hardware and software failures
      Eliminate single point of failure
  Redundancy
  Failover
  Replicas
High Availability - Failover/Redundancy
High Availability - Replicas

  Two or more copies of same data
  Replica granularity
     From node replica to "row" replica
  Load balancing
  Write concurrency
  Replica updates
  Key for high availability and root of several problems
Problems
Problems - CAP Theorem
Problems - CAP Theorem

 Consistency: all operations (reads/writes) yield a global
 consistent state

 Availability: all requests (on non-failed servers) must have
 a response

 Partition Tolerance: nodes may not be able
 to communicate with each other.



                     Pick Two
Problems - CAP Theorem

 C + A: network problems might stop the system

 Examples:
    Oracle RAC, IBM DB2 Parallel
    RDBMS (Master/Slave)
    Google File System
    HDFS (Hadoop)
Problems - CAP Theorem

 C + P: clients can't always perform operations

 Examples:
    Distributed lock-systems: Chubby, ZooKeeper
    Paxos protocol (consensus)
    BigTable, Hbase
    Hypertable
    MongoDB
Problems - CAP Theorem

 A + P: clients may read inconsistent (old or undone) data

 Examples:
    Amazon Dynamo
    Cassandra
    Voldemort
    CouchDB
    Riak
    Caches
Problem with CAP Theorem

 In practice, C + A and C + P systems are the same.
     C + A: not tolerant of network partitions
     C + P: not available when a network partition occurs
 Big problem: network partition
     Not so big (how often does it happens?)
 Pick two
     Availability
     Consistency
 The forgotten: Latency
     Or, how long the system waits before considering a
     partitioned network?
Problems - Real World

Every component may fail:
   Network failure
   Hardware failure
   Electricity
   Natural disasters
   Code failure
Tips & Tricks
Tips & Tricks - Pyramid

  Capacity (connections, operations, ...) Pyramid
Tips & Tricks - Reply Fast

  FAIL Fast
  Break complex requests into smaller ones
  Use timeouts
  No transactions
  Be aware that a single slow operation or component can
  generate contention
  Self-denial attack
Tips & Tricks - Cache

  Cache: component location, data, dns lookups, previous
  requests, etc
  Use negative cache for failed requests (low expiration)
  Don't rely on cache
  Your system must work with no cache
Tips & Tricks - Queues

  Easy way to add asynchronous processing an decouple
  your system.
Tips & Tricks - DNS
Tips & Tricks - Logs

  Log everything
  Use several log levels
  On every log message
       User
       Request host
       Component involved
       Version
       Filename and line
  If log level not enabled do not process log message
       Avoid lookup calls (gettimeofday)
Tips & Tricks - Domino Effect

  Make sure your load balancer won't overload components
  User smart algorithms
     Load Balance
     Resource Allocation
Tips & Tricks - (Zero) Configuration

  No configuration files
  Use good defaults
  Auto-discovery (multicast, gossip, ...)
  Make everything configurable
     Administrative command
     No need to stop for changes
  Automatic self adjusts when possible
Tips & Tricks - STOP Test

  With your system under load: kill -STOP <component>
Tips & Tricks - Know your tools

  load average (uptime)
  stats tools
      vmstat
      iostat
      mpstat
      tcpstat, tcprstat, etc
  tcpdump, nc, netstat
  tunning
      /proc/net/*
      ulimit
      sysctl
  oprofile
  debuging tools (gdb, valgrind)
  ...
Tips & Tricks - Count

  Count everything
     Connections
     Operations
     Failures
     Successes
     Request times (granularity)
  Total, average, standard deviation
  Monitor counters
Tips & Tricks - Stability Patterns

  Use Timeouts
  Circuit Breaker
  Bulkheads
  Steady State
  Fail Fast
  Handshaking
  Test Harness
  Decoupling Middleware
Tips & Tricks - Don't Panic!
Learning More - Books

TCP/IP Illustrated, Vol. 1: The Protocols
Learning More - Books

Unix Network Programming, Vol. 1: The Sockets Networking
Learning More - Books

Pattern Oriented Software Architecture, Vol. 2
Learning More - Books

Release It!
Learning More - Papers

 The Google File System
 Bigtable: A Distributed Storage System for Structured Data
 Dynamo: Amazon's Highly Available Key-Value Store
 PNUTS: Yahoo!’s Hosted Data Serving Platform
 MapReduce: Simplified Data Processing on Large Clusters

 Towards robust distributed systems
 Brewer's conjecture and the feasibility of consistent,
 available, partition-tolerant web services
 BASE: An Acid Alternative
 Looking up data in P2P systems
Thanks!!! Questions?

lucindo.github.com - @rlucindo

More Related Content

What's hot

01. 03.-introduction-to-infrastructure
01. 03.-introduction-to-infrastructure01. 03.-introduction-to-infrastructure
01. 03.-introduction-to-infrastructureMuhammad Ahad
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architecturesPooja Dixit
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsAya Mahmoud
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??Abdul Aslam
 
Service-Oriented Architecture (SOA)
Service-Oriented Architecture (SOA)Service-Oriented Architecture (SOA)
Service-Oriented Architecture (SOA)WSO2
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introductionSujaMaryD
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Agile Methods - course notes
Agile Methods - course notesAgile Methods - course notes
Agile Methods - course notesEvan Leybourn
 
Fragmentation and types of fragmentation in Distributed Database
Fragmentation and types of fragmentation in Distributed DatabaseFragmentation and types of fragmentation in Distributed Database
Fragmentation and types of fragmentation in Distributed DatabaseAbhilasha Lahigude
 
How to Port Your .NET Applications to Linux Using Mono Tools for Visual Studio
How to Port Your .NET Applications to Linux Using Mono Tools for Visual StudioHow to Port Your .NET Applications to Linux Using Mono Tools for Visual Studio
How to Port Your .NET Applications to Linux Using Mono Tools for Visual StudioNovell
 
Cloud Computing and Service oriented Architecture
Cloud Computing and Service oriented Architecture Cloud Computing and Service oriented Architecture
Cloud Computing and Service oriented Architecture Ravindra Dastikop
 
Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture IT Expert Club
 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issuesEsar Qasmi
 
Publish Subscribe pattern - Design Patterns
Publish Subscribe pattern - Design PatternsPublish Subscribe pattern - Design Patterns
Publish Subscribe pattern - Design PatternsRutvik Bapat
 
Top ten big data security and privacy challenges
Top ten big data security and privacy challengesTop ten big data security and privacy challenges
Top ten big data security and privacy challengesBee_Ware
 

What's hot (20)

01. 03.-introduction-to-infrastructure
01. 03.-introduction-to-infrastructure01. 03.-introduction-to-infrastructure
01. 03.-introduction-to-infrastructure
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systems
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
 
Service-Oriented Architecture (SOA)
Service-Oriented Architecture (SOA)Service-Oriented Architecture (SOA)
Service-Oriented Architecture (SOA)
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
Agile Methods - course notes
Agile Methods - course notesAgile Methods - course notes
Agile Methods - course notes
 
SOA
SOASOA
SOA
 
Fragmentation and types of fragmentation in Distributed Database
Fragmentation and types of fragmentation in Distributed DatabaseFragmentation and types of fragmentation in Distributed Database
Fragmentation and types of fragmentation in Distributed Database
 
How to Port Your .NET Applications to Linux Using Mono Tools for Visual Studio
How to Port Your .NET Applications to Linux Using Mono Tools for Visual StudioHow to Port Your .NET Applications to Linux Using Mono Tools for Visual Studio
How to Port Your .NET Applications to Linux Using Mono Tools for Visual Studio
 
Database fragmentation
Database fragmentationDatabase fragmentation
Database fragmentation
 
Cloud Computing and Service oriented Architecture
Cloud Computing and Service oriented Architecture Cloud Computing and Service oriented Architecture
Cloud Computing and Service oriented Architecture
 
Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture Domain Driven Design và Event Driven Architecture
Domain Driven Design và Event Driven Architecture
 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issues
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
Publish Subscribe pattern - Design Patterns
Publish Subscribe pattern - Design PatternsPublish Subscribe pattern - Design Patterns
Publish Subscribe pattern - Design Patterns
 
Top ten big data security and privacy challenges
Top ten big data security and privacy challengesTop ten big data security and privacy challenges
Top ten big data security and privacy challenges
 

Similar to Distributed Systems: scalability and high availability

Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginnerswebhostingguy
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Stuart Charlton
 
Distributed Computing & MapReduce
Distributed Computing & MapReduceDistributed Computing & MapReduce
Distributed Computing & MapReducecoolmirza143
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operationniallmilton
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-applicationNguyễn Duy Nhân
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageNilesh Salpe
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCCal Henderson
 

Similar to Distributed Systems: scalability and high availability (20)

Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009
 
test
testtest
test
 
HeartBeat
HeartBeatHeartBeat
HeartBeat
 
Distributed Computing & MapReduce
Distributed Computing & MapReduceDistributed Computing & MapReduce
Distributed Computing & MapReduce
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed Storage
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 

Recently uploaded

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 

Recently uploaded (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 

Distributed Systems: scalability and high availability

  • 1. Distributed Systems scalability and high availability Renato Lucindo - lucindo.github.com - @rlucindo
  • 2. Renato Lucindo Call me Lucindo (or Linus) 2002 - Bachelor Computer Science 2007 - M.Sc. Computer Science (Combinatorial Optimization) 7+ year developing Distributed Systems My default answer: "I don't know."
  • 3. Agenda Scalability High Availability Problems Tips and Tricks Learning More
  • 4. Distributed Systems Multiple computers that interact with each other over a network to achieve a common goal Purpose Scalability High availability source: http://www.cnds.jhu.edu/
  • 5. Scalability System ability to handle gracefully a growing amount of work Scale up (vertical) Add resources to a single node Improve existing code to handle more work Scale out (horizontal) Add more nodes to a system Linear (or better) scalability
  • 6. Scalability - Vertical Add: CPU, Memory, Disks (bigger box) Handling more simultaneous: Connections Operations Users Choose a good I/O and concurrency model Non-blocking I/O Asynchronous I/O Threads (single, pool, per-connection) Event handling patterns (Reactor, Proactor, ...) Memory model? STM
  • 7. Scalability - Vertical Careful with numbers Requests per second # of Connections Simultaneous operations Event handling Think front-end Slow connections/clients It's slower than other options In doubt, go async Back-end Thread pool (thread per-connection) No events Process per-core
  • 8. Scalability - Horizontal Add nodes to handle more work Front-end Straightforward Stateless Back-end Master/Slave(s) Partitioning DHT Volatile Index
  • 9. Scalability - Horizontal Master/Slave Write on single Master Read on Slaves (one or more) Scales reads
  • 10. Scalability - Horizontal Partitioning (Sharding) Distribute dada across nodes Generally involves data de-normalization Where is some specific data? Master Index Hash (DTH, Consistent Hashing) Volatile Index Joins done in application level NoSQL friendly
  • 11. Scalability - Horizontal Volatile Index: build and maintain data index as cached information (all clients)
  • 12. High Availability "Processes, as well as people, die" Handle hardware and software failures Eliminate single point of failure Redundancy Failover Replicas
  • 13. High Availability - Failover/Redundancy
  • 14. High Availability - Replicas Two or more copies of same data Replica granularity From node replica to "row" replica Load balancing Write concurrency Replica updates Key for high availability and root of several problems
  • 16. Problems - CAP Theorem
  • 17. Problems - CAP Theorem Consistency: all operations (reads/writes) yield a global consistent state Availability: all requests (on non-failed servers) must have a response Partition Tolerance: nodes may not be able to communicate with each other. Pick Two
  • 18. Problems - CAP Theorem C + A: network problems might stop the system Examples: Oracle RAC, IBM DB2 Parallel RDBMS (Master/Slave) Google File System HDFS (Hadoop)
  • 19. Problems - CAP Theorem C + P: clients can't always perform operations Examples: Distributed lock-systems: Chubby, ZooKeeper Paxos protocol (consensus) BigTable, Hbase Hypertable MongoDB
  • 20. Problems - CAP Theorem A + P: clients may read inconsistent (old or undone) data Examples: Amazon Dynamo Cassandra Voldemort CouchDB Riak Caches
  • 21. Problem with CAP Theorem In practice, C + A and C + P systems are the same. C + A: not tolerant of network partitions C + P: not available when a network partition occurs Big problem: network partition Not so big (how often does it happens?) Pick two Availability Consistency The forgotten: Latency Or, how long the system waits before considering a partitioned network?
  • 22. Problems - Real World Every component may fail: Network failure Hardware failure Electricity Natural disasters Code failure
  • 24. Tips & Tricks - Pyramid Capacity (connections, operations, ...) Pyramid
  • 25. Tips & Tricks - Reply Fast FAIL Fast Break complex requests into smaller ones Use timeouts No transactions Be aware that a single slow operation or component can generate contention Self-denial attack
  • 26. Tips & Tricks - Cache Cache: component location, data, dns lookups, previous requests, etc Use negative cache for failed requests (low expiration) Don't rely on cache Your system must work with no cache
  • 27. Tips & Tricks - Queues Easy way to add asynchronous processing an decouple your system.
  • 28. Tips & Tricks - DNS
  • 29. Tips & Tricks - Logs Log everything Use several log levels On every log message User Request host Component involved Version Filename and line If log level not enabled do not process log message Avoid lookup calls (gettimeofday)
  • 30. Tips & Tricks - Domino Effect Make sure your load balancer won't overload components User smart algorithms Load Balance Resource Allocation
  • 31. Tips & Tricks - (Zero) Configuration No configuration files Use good defaults Auto-discovery (multicast, gossip, ...) Make everything configurable Administrative command No need to stop for changes Automatic self adjusts when possible
  • 32. Tips & Tricks - STOP Test With your system under load: kill -STOP <component>
  • 33. Tips & Tricks - Know your tools load average (uptime) stats tools vmstat iostat mpstat tcpstat, tcprstat, etc tcpdump, nc, netstat tunning /proc/net/* ulimit sysctl oprofile debuging tools (gdb, valgrind) ...
  • 34. Tips & Tricks - Count Count everything Connections Operations Failures Successes Request times (granularity) Total, average, standard deviation Monitor counters
  • 35. Tips & Tricks - Stability Patterns Use Timeouts Circuit Breaker Bulkheads Steady State Fail Fast Handshaking Test Harness Decoupling Middleware
  • 36. Tips & Tricks - Don't Panic!
  • 37. Learning More - Books TCP/IP Illustrated, Vol. 1: The Protocols
  • 38. Learning More - Books Unix Network Programming, Vol. 1: The Sockets Networking
  • 39. Learning More - Books Pattern Oriented Software Architecture, Vol. 2
  • 40. Learning More - Books Release It!
  • 41. Learning More - Papers The Google File System Bigtable: A Distributed Storage System for Structured Data Dynamo: Amazon's Highly Available Key-Value Store PNUTS: Yahoo!’s Hosted Data Serving Platform MapReduce: Simplified Data Processing on Large Clusters Towards robust distributed systems Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services BASE: An Acid Alternative Looking up data in P2P systems