SlideShare a Scribd company logo
1 of 33
Zab: High-performance broadcast
  for primary-backup systems
  Flavio Junqueira, Benjamin Reed, Marco Serafini

                Yahoo! Research
                     June 2011
Setting up the stage


•   Background: ZooKeeper
•   Coordination service
    ! Web-scale applications
    ! Intensive use (high performance)
    ! Source of truth for many applications


                        June 2011             2
ZooKeeper

•   Open source Apache project
•   Used in production
    ! Yahoo!
    ! Facebook
    ! Rackspace
    ! ...
                                http://zookeeper.apache.org

                    June 2011                             3
ZooKeeper

•   ... is a leader-based, replicated service
    ! Processes crash and recover

•   Leader
    ! Executes requests
                                          Leader     Follower     Follower
    ! Propagates state updates
                                     Broadcast     Deliver      Deliver

•   Follower
                                                 Atomic broadcast
    ! Applies state updates

                              June 2011                                   4
ZooKeeper

•   Client
                                                    Client
    ! Submits operations to a
      server                                               Request

    ! If follower, forwards to          Leader     Follower      Follower
      leader
                                   Broadcast     Deliver       Deliver
    ! Leader executes and
      propagates state update                  Atomic broadcast


                            June 2011                                    5
ZooKeeper

•   State updates
    ! All followers apply the same updates
    ! All followers apply them in the same order
    ! Atomic broadcast

•   Performance requirements
    ! Multiple outstanding operations
    ! Low latency and high throughput
                         June 2011                 6
ZooKeeper
• Update configuration and create ready
• If ready exists, then configuration is
consistent
                                                    setData        del
                                     setData      /cfg/client   /cfg/ready
                                    /cfg/server
                         create          B
                                                       B
                                                                             Follower
                       /cfg/ready

         Leader
                        create
                      /cfg/ready     setData                                 Follower
                                    /cfg/server     setData
                                         B        /cfg/client      del
                                                       B        /cfg/ready




    • If 1 doesn’t commit, then 2+3 can’t                • If 2+3 don’t commit, then 4 must not
    commit                                               commit
                                             June 2011                                       7
ZooKeeper

•   Exploring Paxos
    ! Efficient consensus protocol
    ! State-machine replication
    ! Multiple consecutive instances

•   Why is it not suitable out of the box?
    ! Does not guarantee order
    ! Multiple outstanding operations

                        June 2011            8
Paxos at a glance
                     1b: Acceptor promises         2b: If quorum, value
                      not to accept lower                 is chosen
                             ballots
Acceptor + Learner


                     1a               1b        2a                  2b 3a
    Acceptor +
Proposer + Learner

                      1a              1b          2a                2b    3a
Acceptor + Learner

                          Phase 1:                       Phase 2:           Phase 3:
                           Selects                       Proposes            Value
                          value to                        a value           learned
                          propose

                                             June 2011                                 9
Paxos run                                           Interleaves
                                                                             operations of P1,
           27: <1a,3>                                    27: <2a, 3, C>      P2, and and P3
           28: <1a,3>                                    28: <2a, 3, B>
           29: <1a,3>                                    29: <2a, 3, D>
P3
                        Has
                   accepted A and
                     B from P1
A1
     27: <1, A>               27: <1b, 1, A>
     28: <1, B>               28: <1b, 1, B>
                              29: <1b, _, _>
A2
                             Has                                          27: <3, C>
     27: <2, C>
                         accepted C                                       28: <3, B>
                           from P2                                        29: <3, D>
A3
     27: <2, C>                         27: <1b, 2, C>          27: <3, C>
                                        28: <1b, _, _>          28: <3, B>
                                        29: <1b, _, _>
                                                                29: <3, D>




                                          June 2011                                              10
ZooKeeper

•   Another requirement
    ! Minimize downtime
    ! Efficient recovery

•   Reduce the amount of state transfered
•   Zab
    ! One identifier
    ! Missing values for each process

                          June 2011         11
Zab and PO Broadcast
Definitions

•   Processes: Lead or Follow
•   Followers
    ! Maintain a history of transactions (updates)

•   Transaction identifiers: !e,c"

    ! e : epoch number of the leader
    ! c : epoch counter

                             June 2011               13
Properties of PO Broadcast


•   Integrity
    ! Only broadcast transactions are delivered
    ! Leader recovers before broadcasting new transactions

•   Total order and agreement
    ! Followers deliver the same transactions and in the
      same order


                             June 2011                       14
Primary order

•   Local: Transactions of a leader accepted in
    order
•   Global: Transactions in history respect the
    order of epochs




                      June 2011                   15
Primary order

•    Local: Transactions of a primary accepted in
     order
•    Global: Transactions in history respect the
     order of epochs
             abcast(!e,10") abcast(!e,11") abcast(!e,12")
    Leader



Follower



                                     June 2011              16
Primary order

•    Local: Transactions of a primary accepted in
     order
•    Global: Transactions in history respect the
     order of epochs
             abcast(!e,10") abcast(!e,11") abcast(!e,12")
    Leader



Follower



                                    June 2011               17
Primary order

•     Local: Transactions of a primary accepted in
      order
•     Global: Transactions in history respect the
      order of epochs
               abcast(!e,10") abcast(!e,11")
    Leader

                                               abcast(!e’,1")
    Leader’


    Follower
                                        June 2011               18
Primary order

•    Local: Transactions of a primary accepted in
     order
•    Global: Transactions in history respect the
     order of epochs
              abcast(!e,10")         abcast(!e,11")
    Leader

                               abcast(!e’,1")
    Leader’


Follower
                                       June 2011      19
Zab in Phases

•   Phase 0 - Leader election
    ! Prospective leader          elected

•   Phase 1- Discovery
    ! Followers promise not to go back to previous
      epochs
    ! Followers send to          their last epoch and history

    !    selects longest history of latest epoch
                           June 2011                            20
Zab in Phases

•   Phase 2 - Synchronization
    !    sends new history to followers

    ! Followers confirm leadership

•   Phase 3 - Broadcast
    !    proposes new transactions

    !    commits if quorum acknowledges

                       June 2011          21
Zab in Phases


•   Phases 1 and 2: Recovery
    ! Critical to guarantee order with multiple
      outstanding transactions

•   Phase 3: Broadcast
    ! Just like Phases 2 and 3 of Paxos



                         June 2011                22
Zab: Sample run

                  f1                  f2       f3

               !0,1"               !0,1"     !0,1"
               !0,2"               !0,2"
               !0,3"
New epoch
             f1.a = 0,          f2.a = 0,   f3.a = 0,
               !0,3"              !0,2"       !0,1"
            Initial history
            of new epoch



                              June 2011                 23
Zab: Sample run

                  f1               f2         f3

                !0,1"          !0,1"        !0,1"
                !0,2"          !0,2"        !0,2"
     Chosen!    !1,1"          !1,1"
                !1,2"
New epoch

               f1.a = 1,      f2.a = 1,    f3.a = 2,
                 !1,2"          !1,1"        !0,2"

                           Can’t happen!


                              June 2011                24
Paxos run (revisited)
       Epoch 1, Phase 3                Epoch 2, Phase 3                  Epoch 3, Phase 3
         L1 History: #     Phases 1     L2 History: #        Phases 1     L3 History: !2,1",C
                             and 2                             and 2
                          of Epoch 2                        of Epoch 3




Follower 1
              Epoch: 1                           Epoch: 1                      Epoch: 3
              !1,1",A                            !1,1",A                       !2,1",C
              !1,2",B                            !1,2",B                       !3,1",D
Follower 2
              Epoch: 1                           Epoch: 2                      Epoch: 2
              #                                  !2,1",C                       !2,1",C

Follower 3                                                                     Epoch: 3
              Epoch: 1                           Epoch: 2
              #                                  !2,1",C                       !2,1",C
                                                                               !3,1",D



                                           June 2011                                            25
Notes on implementation

•   Use of TCP
    ! Ordered delivery, retransmissions, etc.

    ! Notion of session

•   Elect leader with most committed txns
    ! No follower ! leader copies

•   Recovery
    ! Last zxid is sufficient
    ! In Phase 2, leader commands to add or truncate

                               June 2011               26
Performance
Experimental setup


•   Implementation in Java
•   13 identical servers
    ! Xeon 2.50GHz, Gigabit interface, two SATA
      disks


                                   http://zookeeper.apache.org

                       June 2011                             28
Throughput
                                        Continuous saturated throughput
                        70000
                                                                         Net only
                                                                      Net + Disk
                        60000                         Net + Disk (no write cache)
                                                                          Net cap

                        50000
Operations per second




                        40000


                        30000


                        20000


                        10000


                            0
                                2   4     6           8          10           12    14
                                        Number of servers in ensemble




                                                  June 2011                              29
Latency




  June 2011   30
Wrap up
Conclusion

•   Zookeeper
    ! Multiple outstanding operations
    ! Dependencies between consecutive updates

•   Zab
    ! Primary Order Broadcast
    ! Synchronization phase
    ! Efficient recovery


                              June 2011          32
Questions?


http://zookeeper.apache.org

More Related Content

Similar to Zab dsn-2011

Zararfa SummerCamp 2012 - Community update and Zarafa Development Process
Zararfa SummerCamp 2012 - Community update and Zarafa Development ProcessZararfa SummerCamp 2012 - Community update and Zarafa Development Process
Zararfa SummerCamp 2012 - Community update and Zarafa Development ProcessZarafa
 
Getting started with GIT
Getting started with GITGetting started with GIT
Getting started with GITpratz0909
 
New York Kubernetes: CI/CD Patterns for Kubernetes
New York Kubernetes: CI/CD Patterns for KubernetesNew York Kubernetes: CI/CD Patterns for Kubernetes
New York Kubernetes: CI/CD Patterns for KubernetesAndrew Phillips
 
Value-Stream-Mapping,
Value-Stream-Mapping, Value-Stream-Mapping,
Value-Stream-Mapping, Towo Toivola
 
Atril-Déjà Vu Tea mserver 2 general presentation
Atril-Déjà Vu Tea mserver 2   general presentationAtril-Déjà Vu Tea mserver 2   general presentation
Atril-Déjà Vu Tea mserver 2 general presentationcohlmann
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11Hortonworks
 
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Puppet
 
AWS Customer Presentation - The Server Labs
AWS Customer Presentation - The Server Labs AWS Customer Presentation - The Server Labs
AWS Customer Presentation - The Server Labs Amazon Web Services
 
Kubernetes I Deep Dive.pptx
Kubernetes I Deep Dive.pptxKubernetes I Deep Dive.pptx
Kubernetes I Deep Dive.pptxssuser368371
 
Lean and Kanban Principles for Software Developers
Lean and Kanban Principles for Software DevelopersLean and Kanban Principles for Software Developers
Lean and Kanban Principles for Software DevelopersCory Foy
 
Is Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededIs Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededchiportal
 
Stairway to heaven webinar
Stairway to heaven webinarStairway to heaven webinar
Stairway to heaven webinarCloudBees
 
Release This! - Tools for a Smooth Release Cycle
Release This! - Tools for a Smooth Release CycleRelease This! - Tools for a Smooth Release Cycle
Release This! - Tools for a Smooth Release CyclePerforce
 
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...IBM Danmark
 
Dev Tools State of the Union (Part I) - Atlassian Summit 2010
Dev Tools State of the Union (Part I) - Atlassian Summit 2010Dev Tools State of the Union (Part I) - Atlassian Summit 2010
Dev Tools State of the Union (Part I) - Atlassian Summit 2010Atlassian
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...SQLExpert.pl
 
A Look at Plone 4
A Look at Plone 4A Look at Plone 4
A Look at Plone 4Eric Steele
 

Similar to Zab dsn-2011 (20)

Zararfa SummerCamp 2012 - Community update and Zarafa Development Process
Zararfa SummerCamp 2012 - Community update and Zarafa Development ProcessZararfa SummerCamp 2012 - Community update and Zarafa Development Process
Zararfa SummerCamp 2012 - Community update and Zarafa Development Process
 
Getting started with GIT
Getting started with GITGetting started with GIT
Getting started with GIT
 
New York Kubernetes: CI/CD Patterns for Kubernetes
New York Kubernetes: CI/CD Patterns for KubernetesNew York Kubernetes: CI/CD Patterns for Kubernetes
New York Kubernetes: CI/CD Patterns for Kubernetes
 
How to Introduce Continuous Delivery
How to Introduce Continuous DeliveryHow to Introduce Continuous Delivery
How to Introduce Continuous Delivery
 
Value-Stream-Mapping,
Value-Stream-Mapping, Value-Stream-Mapping,
Value-Stream-Mapping,
 
Atril-Déjà Vu Tea mserver 2 general presentation
Atril-Déjà Vu Tea mserver 2   general presentationAtril-Déjà Vu Tea mserver 2   general presentation
Atril-Déjà Vu Tea mserver 2 general presentation
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11
 
Go Training
Go TrainingGo Training
Go Training
 
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
 
Subversion last minute survival crash course
Subversion  last minute survival crash courseSubversion  last minute survival crash course
Subversion last minute survival crash course
 
AWS Customer Presentation - The Server Labs
AWS Customer Presentation - The Server Labs AWS Customer Presentation - The Server Labs
AWS Customer Presentation - The Server Labs
 
Kubernetes I Deep Dive.pptx
Kubernetes I Deep Dive.pptxKubernetes I Deep Dive.pptx
Kubernetes I Deep Dive.pptx
 
Lean and Kanban Principles for Software Developers
Lean and Kanban Principles for Software DevelopersLean and Kanban Principles for Software Developers
Lean and Kanban Principles for Software Developers
 
Is Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededIs Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic needed
 
Stairway to heaven webinar
Stairway to heaven webinarStairway to heaven webinar
Stairway to heaven webinar
 
Release This! - Tools for a Smooth Release Cycle
Release This! - Tools for a Smooth Release CycleRelease This! - Tools for a Smooth Release Cycle
Release This! - Tools for a Smooth Release Cycle
 
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...
 
Dev Tools State of the Union (Part I) - Atlassian Summit 2010
Dev Tools State of the Union (Part I) - Atlassian Summit 2010Dev Tools State of the Union (Part I) - Atlassian Summit 2010
Dev Tools State of the Union (Part I) - Atlassian Summit 2010
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
 
A Look at Plone 4
A Look at Plone 4A Look at Plone 4
A Look at Plone 4
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Zab dsn-2011

  • 1. Zab: High-performance broadcast for primary-backup systems Flavio Junqueira, Benjamin Reed, Marco Serafini Yahoo! Research June 2011
  • 2. Setting up the stage • Background: ZooKeeper • Coordination service ! Web-scale applications ! Intensive use (high performance) ! Source of truth for many applications June 2011 2
  • 3. ZooKeeper • Open source Apache project • Used in production ! Yahoo! ! Facebook ! Rackspace ! ... http://zookeeper.apache.org June 2011 3
  • 4. ZooKeeper • ... is a leader-based, replicated service ! Processes crash and recover • Leader ! Executes requests Leader Follower Follower ! Propagates state updates Broadcast Deliver Deliver • Follower Atomic broadcast ! Applies state updates June 2011 4
  • 5. ZooKeeper • Client Client ! Submits operations to a server Request ! If follower, forwards to Leader Follower Follower leader Broadcast Deliver Deliver ! Leader executes and propagates state update Atomic broadcast June 2011 5
  • 6. ZooKeeper • State updates ! All followers apply the same updates ! All followers apply them in the same order ! Atomic broadcast • Performance requirements ! Multiple outstanding operations ! Low latency and high throughput June 2011 6
  • 7. ZooKeeper • Update configuration and create ready • If ready exists, then configuration is consistent setData del setData /cfg/client /cfg/ready /cfg/server create B B Follower /cfg/ready Leader create /cfg/ready setData Follower /cfg/server setData B /cfg/client del B /cfg/ready • If 1 doesn’t commit, then 2+3 can’t • If 2+3 don’t commit, then 4 must not commit commit June 2011 7
  • 8. ZooKeeper • Exploring Paxos ! Efficient consensus protocol ! State-machine replication ! Multiple consecutive instances • Why is it not suitable out of the box? ! Does not guarantee order ! Multiple outstanding operations June 2011 8
  • 9. Paxos at a glance 1b: Acceptor promises 2b: If quorum, value not to accept lower is chosen ballots Acceptor + Learner 1a 1b 2a 2b 3a Acceptor + Proposer + Learner 1a 1b 2a 2b 3a Acceptor + Learner Phase 1: Phase 2: Phase 3: Selects Proposes Value value to a value learned propose June 2011 9
  • 10. Paxos run Interleaves operations of P1, 27: <1a,3> 27: <2a, 3, C> P2, and and P3 28: <1a,3> 28: <2a, 3, B> 29: <1a,3> 29: <2a, 3, D> P3 Has accepted A and B from P1 A1 27: <1, A> 27: <1b, 1, A> 28: <1, B> 28: <1b, 1, B> 29: <1b, _, _> A2 Has 27: <3, C> 27: <2, C> accepted C 28: <3, B> from P2 29: <3, D> A3 27: <2, C> 27: <1b, 2, C> 27: <3, C> 28: <1b, _, _> 28: <3, B> 29: <1b, _, _> 29: <3, D> June 2011 10
  • 11. ZooKeeper • Another requirement ! Minimize downtime ! Efficient recovery • Reduce the amount of state transfered • Zab ! One identifier ! Missing values for each process June 2011 11
  • 12. Zab and PO Broadcast
  • 13. Definitions • Processes: Lead or Follow • Followers ! Maintain a history of transactions (updates) • Transaction identifiers: !e,c" ! e : epoch number of the leader ! c : epoch counter June 2011 13
  • 14. Properties of PO Broadcast • Integrity ! Only broadcast transactions are delivered ! Leader recovers before broadcasting new transactions • Total order and agreement ! Followers deliver the same transactions and in the same order June 2011 14
  • 15. Primary order • Local: Transactions of a leader accepted in order • Global: Transactions in history respect the order of epochs June 2011 15
  • 16. Primary order • Local: Transactions of a primary accepted in order • Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") abcast(!e,12") Leader Follower June 2011 16
  • 17. Primary order • Local: Transactions of a primary accepted in order • Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") abcast(!e,12") Leader Follower June 2011 17
  • 18. Primary order • Local: Transactions of a primary accepted in order • Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") Leader abcast(!e’,1") Leader’ Follower June 2011 18
  • 19. Primary order • Local: Transactions of a primary accepted in order • Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") Leader abcast(!e’,1") Leader’ Follower June 2011 19
  • 20. Zab in Phases • Phase 0 - Leader election ! Prospective leader elected • Phase 1- Discovery ! Followers promise not to go back to previous epochs ! Followers send to their last epoch and history ! selects longest history of latest epoch June 2011 20
  • 21. Zab in Phases • Phase 2 - Synchronization ! sends new history to followers ! Followers confirm leadership • Phase 3 - Broadcast ! proposes new transactions ! commits if quorum acknowledges June 2011 21
  • 22. Zab in Phases • Phases 1 and 2: Recovery ! Critical to guarantee order with multiple outstanding transactions • Phase 3: Broadcast ! Just like Phases 2 and 3 of Paxos June 2011 22
  • 23. Zab: Sample run f1 f2 f3 !0,1" !0,1" !0,1" !0,2" !0,2" !0,3" New epoch f1.a = 0, f2.a = 0, f3.a = 0, !0,3" !0,2" !0,1" Initial history of new epoch June 2011 23
  • 24. Zab: Sample run f1 f2 f3 !0,1" !0,1" !0,1" !0,2" !0,2" !0,2" Chosen! !1,1" !1,1" !1,2" New epoch f1.a = 1, f2.a = 1, f3.a = 2, !1,2" !1,1" !0,2" Can’t happen! June 2011 24
  • 25. Paxos run (revisited) Epoch 1, Phase 3 Epoch 2, Phase 3 Epoch 3, Phase 3 L1 History: # Phases 1 L2 History: # Phases 1 L3 History: !2,1",C and 2 and 2 of Epoch 2 of Epoch 3 Follower 1 Epoch: 1 Epoch: 1 Epoch: 3 !1,1",A !1,1",A !2,1",C !1,2",B !1,2",B !3,1",D Follower 2 Epoch: 1 Epoch: 2 Epoch: 2 # !2,1",C !2,1",C Follower 3 Epoch: 3 Epoch: 1 Epoch: 2 # !2,1",C !2,1",C !3,1",D June 2011 25
  • 26. Notes on implementation • Use of TCP ! Ordered delivery, retransmissions, etc. ! Notion of session • Elect leader with most committed txns ! No follower ! leader copies • Recovery ! Last zxid is sufficient ! In Phase 2, leader commands to add or truncate June 2011 26
  • 28. Experimental setup • Implementation in Java • 13 identical servers ! Xeon 2.50GHz, Gigabit interface, two SATA disks http://zookeeper.apache.org June 2011 28
  • 29. Throughput Continuous saturated throughput 70000 Net only Net + Disk 60000 Net + Disk (no write cache) Net cap 50000 Operations per second 40000 30000 20000 10000 0 2 4 6 8 10 12 14 Number of servers in ensemble June 2011 29
  • 30. Latency June 2011 30
  • 32. Conclusion • Zookeeper ! Multiple outstanding operations ! Dependencies between consecutive updates • Zab ! Primary Order Broadcast ! Synchronization phase ! Efficient recovery June 2011 32