SlideShare a Scribd company logo
1 of 31
Intra-cluster Replication for
        Apache Kafka
           Jun Rao
About myself
• Engineer at LinkedIn since 2010
• Worked on Apache Kafka and Cassandra
• Database researcher at IBM
Outline
•   Overview of Kafka
•   Kafka architecture
•   Kafka replication design
•   Performance
•   Q/A
What’s Kafka
• A distributed pub/sub messaging system
• Used in many places
  – LinkedIn, Twitter, Box, FourSquare …
• What do people use it for?
  – log aggregation
  – real-time event processing
  – monitoring
  – queuing
Example Kafka Apps at LinkedIn
Kafka Deployment at LinkedIn
          Live data center                               Offline data center

 Live              Live           Live
service           service        service


                              interactive data
                              (human, machine)

Monitorin
   g

                  Kafka                                   Kafka
                                                           Kafka               Hadoop
                                                                                Hadoop
                   Kafka
                    Kafka                                   Kafka                Hadoop



                 Per day stats
                 • writes: 10+ billion messages (2+TB compressed data)
                 • reads: 50+ billion messages
Kafka vs. Other Messaging Systems
•   Scale-out from groundup
•   Persistence to disks
•   High throughput (10s MB/sec per server)
•   Multi-subscription
Outline
•   Overview of Kafka
•   Kafka architecture
•   Kafka replication design
•   Performance
•   Q/A
Kafka Architecture
         Producer                            Producer




Broker              Broker   Zookeeper   Broker         Broker




         Consumer                                 Consumer
Terminologies
• Topic = message stream
• Topic has partitions
  – partitions distributed to brokers
• Partition has a log on disk
  – message persisted in log
  – message addressed by offset
API
• Producer
  messages = new List<KeyedMessage<K,V>>();
  messages.add(newKeyedMessage(“topic1”, null, “msg1”);
  send(messages);



• Consumer
  streams[] = Consumer.createMessageStream(“topic1”, 1);

  for(message: streams[0]) {
     // do something with message
  }
Deliver High Throughput
• Simple storage
                    logs in broker
                                                     msg-1
                                                     msg-2
             topic1:part1      topic2:part1          msg-3
                                                     msg-4   index
             segment-1          segment-1            msg-5
                                                      …
                                                      …
             segment-2          segment-2            msg-n



                                              read()
             segment-n          segment-n
                                              append()



• Batched writes and reads
• Zero-copy transfer from file to socket
• Compression (batched)
Outline
•   Overview of Kafka
•   Kafka architecture
•   Kafka replication design
•   Performance
•   Q/A
Why Replication
• Broker can go down
  – controlled: rolling restart for code/config push
  – uncontrolled: isolated broker failure
• If broker down
  – some partitions unavailable
  – could be permanent data loss
• Replication  higher availability and
  durability
CAP Theorem
• Pick two from
  – consistency
  – availability
  – network partitioning
Kafka Replication: Pick CA
• Brokers within a datacenter
  – i.e., network partitioning is rare
• Strong consistency
  – replicas byte-wise identical
• Highly available
  – typical failover time: < 10ms
Replicas and Layout
• Partition has replicas
• Replicas spread evenly among brokers


       logs              logs           logs           logs

    topic1-part1      topic1-part2   topic2-part1   topic2-part2

    topic2-part2      topic1-part1   topic1-part2   topic2-part1

    topic2-part1      topic2-part2   topic1-part1   topic1-part2

      broker 1          broker 2       broker 3        broker 4
Maintain Strongly Consistent Replicas
•   One of the replicas is leader
•   All writes go to leader
•   Leader propagates writes to followers in order
•   Leader decides when to commit message
Conventional Quorum-based Commit
• Wait for majority of replicas (e.g. Zookeeper)
• Plus: good latency
• Minus: 2f+1 replicas  tolerate f failures
  – ideally want to tolerate 2f failures
Commit Messages in Kafka
• Leader maintains in-sync-replicas (ISR)
  – initially, all replicas in ISR
  – message committed if received by ISR
  – follower fails  dropped from ISR
  – leader commits using new ISR
• Benefit: f replicas  tolerate f-1 failures
  – latency less an issue within datacenter
Data Flow in Replication
     producer
                                                   2
     ack            1
                                    2
                leader                  follower                    follower
                3

                commit
           4
                    topic1-part1           topic1-part1                topic1-part1
consumer

                         broker 1              broker 2                    broker 3




           When producer receives ack Latency                                     Durabilityon failures
           no ack                                      no network delay           some data loss
           wait for leader                             1 network roundtrip        a few data loss
           wait for committed                          2 network roundtrips no data loss

           Only committed messages exposed to consumers
                • independent of ack type chosen by producer
Extend to Multiple Partitions
producer




    leader                           follower             follower
           topic1-part1                  topic1-part1         topic1-part1
                          producer



                                     leader              follower              follower              producer
                                         topic2-part1        topic2-part1          topic2-part1



                                     follower            follower              leader
                                         topic3-part1         topic3-part1          topic3-part1

             broker 1                         broker 2              broker 3              broker 4




• Leaders are evenly spread among brokers
Handling Follower Failures
• Leader maintains last committed offset
  – propagated to followers
  – checkpointed to disk
• When follower restarts
  – truncate log to last committed
  – fetch data from leader
  – fully caught up  added to ISR
Handling Leader Failure
• Use an embedded controller (inspired by Helix)
  – detect broker failure via Zookeeper
  – on leader failure: elect new leader from ISR
  – committed messages not lost
• Leader and ISR written to Zookeeper
  – for controller failover
  – expected to change infrequently
Example of Replica Recovery
1. ISR = {A,B,C}; Leader A commits message m1;
                  L (A)   F (B)        F (C)
                  m1       m1           m1
last committed             m2
                  m2
                  m3

2. A fails and B is new leader; ISR = {B,C}; B commits m2, but not m3
                  L (A)   L (B)        F (C)
                  m1       m1           m1
                  m2       m2           m2
                  m3

3. B commits new messages m4, m5
                  L (A)   L (B)        F (C)
                  m1       m1           m1
                  m2       m2           m2
                  m3       m4           m4
                           m5           m5


4. A comes back, truncates to m1 and catches up; finally ISR = {A,B,C}
                  F (A)   L (B)        F (C)                 F (A)       L (B)   F (C)
                  m1       m1           m1                   m1           m1      m1
                           m2           m2                   m2           m2      m2
                           m4           m4                   m4           m4      m4
                           m5           m5                   m5           m5      m5
Outline
•   Overview of Kafka
•   Kafka architecture
•   Kafka replication design
•   Performance
•   Q/A
Setup
•   3 brokers
•   1 topic with 1 partition
•   Replication factor=3
•   Message size = 1KB
Choosing btw Latency and Durability


    When producer      Time to publish Durabilityon
    receives ack       a message (ms) failures
    no ack             0.29            some data loss
    wait for leader    1.05            a few data loss
    wait for committed 2.05            no data loss
Producer Throughput

            varying messages per send                               varying # concurrent producers
       70                                                      70
       60                                                      60
       50                                                      50
MB/s




                                                        MB/s
       40                                                      40
                                            no ack                                              no ack
       30                                                      30
       20                                   leader             20                               leader
       10                                   committed          10                               committed
        0                                                       0
             1     10       100      1000                              1     5        10   20
                 messages per send                                           # producers
Consumer Throughput

               throughput vs fetch size
        100

         80

         60
 MB/s




         40

         20

          0
              1KB      10KB                100KB   1MB
                              fetch size
Q/A
• Kafka 0.8.0 (intra-cluster replication)
  – expected to be released in Mar
  – various performance improvements in the future
• Checkout more about Kafka
  – http://kafka.apache.org/
• Kafka meetup tonight

More Related Content

What's hot

Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connectconfluent
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드confluent
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraEric Evans
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 

What's hot (20)

Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 

Similar to Kafka replication apachecon_2013

Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueShafaq Abdullah
 
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...Antonio Garrote Hernández
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafkaconfluent
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical OverviewSylvester John
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking VN
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Sidd...
Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Sidd...Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Sidd...
Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Sidd...slashn
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows AzureDamir Dobric
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMGuillaume Arnaud
 
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Paolo Negri
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)Erhwen Kuo
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
 
MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup Ran Levy
 
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...Jeff Malek
 

Similar to Kafka replication apachecon_2013 (20)

Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message Queue
 
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical Overview
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocolsGrokking TechTalk #24: Kafka's principles and protocols
Grokking TechTalk #24: Kafka's principles and protocols
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Sidd...
Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Sidd...Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Sidd...
Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Sidd...
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows Azure
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoM
 
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup MyHeritage Kakfa use cases - Feb 2014 Meetup
MyHeritage Kakfa use cases - Feb 2014 Meetup
 
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...
 

Kafka replication apachecon_2013

  • 1. Intra-cluster Replication for Apache Kafka Jun Rao
  • 2. About myself • Engineer at LinkedIn since 2010 • Worked on Apache Kafka and Cassandra • Database researcher at IBM
  • 3. Outline • Overview of Kafka • Kafka architecture • Kafka replication design • Performance • Q/A
  • 4. What’s Kafka • A distributed pub/sub messaging system • Used in many places – LinkedIn, Twitter, Box, FourSquare … • What do people use it for? – log aggregation – real-time event processing – monitoring – queuing
  • 5. Example Kafka Apps at LinkedIn
  • 6. Kafka Deployment at LinkedIn Live data center Offline data center Live Live Live service service service interactive data (human, machine) Monitorin g Kafka Kafka Kafka Hadoop Hadoop Kafka Kafka Kafka Hadoop Per day stats • writes: 10+ billion messages (2+TB compressed data) • reads: 50+ billion messages
  • 7. Kafka vs. Other Messaging Systems • Scale-out from groundup • Persistence to disks • High throughput (10s MB/sec per server) • Multi-subscription
  • 8. Outline • Overview of Kafka • Kafka architecture • Kafka replication design • Performance • Q/A
  • 9. Kafka Architecture Producer Producer Broker Broker Zookeeper Broker Broker Consumer Consumer
  • 10. Terminologies • Topic = message stream • Topic has partitions – partitions distributed to brokers • Partition has a log on disk – message persisted in log – message addressed by offset
  • 11. API • Producer messages = new List<KeyedMessage<K,V>>(); messages.add(newKeyedMessage(“topic1”, null, “msg1”); send(messages); • Consumer streams[] = Consumer.createMessageStream(“topic1”, 1); for(message: streams[0]) { // do something with message }
  • 12. Deliver High Throughput • Simple storage logs in broker msg-1 msg-2 topic1:part1 topic2:part1 msg-3 msg-4 index segment-1 segment-1 msg-5 … … segment-2 segment-2 msg-n read() segment-n segment-n append() • Batched writes and reads • Zero-copy transfer from file to socket • Compression (batched)
  • 13. Outline • Overview of Kafka • Kafka architecture • Kafka replication design • Performance • Q/A
  • 14. Why Replication • Broker can go down – controlled: rolling restart for code/config push – uncontrolled: isolated broker failure • If broker down – some partitions unavailable – could be permanent data loss • Replication  higher availability and durability
  • 15. CAP Theorem • Pick two from – consistency – availability – network partitioning
  • 16. Kafka Replication: Pick CA • Brokers within a datacenter – i.e., network partitioning is rare • Strong consistency – replicas byte-wise identical • Highly available – typical failover time: < 10ms
  • 17. Replicas and Layout • Partition has replicas • Replicas spread evenly among brokers logs logs logs logs topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part1 topic2-part2 topic1-part1 topic1-part2 broker 1 broker 2 broker 3 broker 4
  • 18. Maintain Strongly Consistent Replicas • One of the replicas is leader • All writes go to leader • Leader propagates writes to followers in order • Leader decides when to commit message
  • 19. Conventional Quorum-based Commit • Wait for majority of replicas (e.g. Zookeeper) • Plus: good latency • Minus: 2f+1 replicas  tolerate f failures – ideally want to tolerate 2f failures
  • 20. Commit Messages in Kafka • Leader maintains in-sync-replicas (ISR) – initially, all replicas in ISR – message committed if received by ISR – follower fails  dropped from ISR – leader commits using new ISR • Benefit: f replicas  tolerate f-1 failures – latency less an issue within datacenter
  • 21. Data Flow in Replication producer 2 ack 1 2 leader follower follower 3 commit 4 topic1-part1 topic1-part1 topic1-part1 consumer broker 1 broker 2 broker 3 When producer receives ack Latency Durabilityon failures no ack no network delay some data loss wait for leader 1 network roundtrip a few data loss wait for committed 2 network roundtrips no data loss Only committed messages exposed to consumers • independent of ack type chosen by producer
  • 22. Extend to Multiple Partitions producer leader follower follower topic1-part1 topic1-part1 topic1-part1 producer leader follower follower producer topic2-part1 topic2-part1 topic2-part1 follower follower leader topic3-part1 topic3-part1 topic3-part1 broker 1 broker 2 broker 3 broker 4 • Leaders are evenly spread among brokers
  • 23. Handling Follower Failures • Leader maintains last committed offset – propagated to followers – checkpointed to disk • When follower restarts – truncate log to last committed – fetch data from leader – fully caught up  added to ISR
  • 24. Handling Leader Failure • Use an embedded controller (inspired by Helix) – detect broker failure via Zookeeper – on leader failure: elect new leader from ISR – committed messages not lost • Leader and ISR written to Zookeeper – for controller failover – expected to change infrequently
  • 25. Example of Replica Recovery 1. ISR = {A,B,C}; Leader A commits message m1; L (A) F (B) F (C) m1 m1 m1 last committed m2 m2 m3 2. A fails and B is new leader; ISR = {B,C}; B commits m2, but not m3 L (A) L (B) F (C) m1 m1 m1 m2 m2 m2 m3 3. B commits new messages m4, m5 L (A) L (B) F (C) m1 m1 m1 m2 m2 m2 m3 m4 m4 m5 m5 4. A comes back, truncates to m1 and catches up; finally ISR = {A,B,C} F (A) L (B) F (C) F (A) L (B) F (C) m1 m1 m1 m1 m1 m1 m2 m2 m2 m2 m2 m4 m4 m4 m4 m4 m5 m5 m5 m5 m5
  • 26. Outline • Overview of Kafka • Kafka architecture • Kafka replication design • Performance • Q/A
  • 27. Setup • 3 brokers • 1 topic with 1 partition • Replication factor=3 • Message size = 1KB
  • 28. Choosing btw Latency and Durability When producer Time to publish Durabilityon receives ack a message (ms) failures no ack 0.29 some data loss wait for leader 1.05 a few data loss wait for committed 2.05 no data loss
  • 29. Producer Throughput varying messages per send varying # concurrent producers 70 70 60 60 50 50 MB/s MB/s 40 40 no ack no ack 30 30 20 leader 20 leader 10 committed 10 committed 0 0 1 10 100 1000 1 5 10 20 messages per send # producers
  • 30. Consumer Throughput throughput vs fetch size 100 80 60 MB/s 40 20 0 1KB 10KB 100KB 1MB fetch size
  • 31. Q/A • Kafka 0.8.0 (intra-cluster replication) – expected to be released in Mar – various performance improvements in the future • Checkout more about Kafka – http://kafka.apache.org/ • Kafka meetup tonight