Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kafka Needs No Keeper

Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2y2yPiS.

Colin McCabe talks about the ongoing effort to replace the use of Zookeeper in Kafka: why they want to do it and how it will work. He discusses the limitations they have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. He talks about their progress, what work is remaining, and how contributors can help. Filmed at qconsf.com.

Colin McCabe is a Kafka committer at Confluent, working on the scalability and extensibility of Kafka. Previously, he worked on the Hadoop Distributed Filesystem and the Ceph Filesystem.

  • Login to see the comments

  • Be the first to like this

Kafka Needs No Keeper

  1. 1. 1 Kafka Needs No Keeper Colin McCabe
  2. 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ kafka-zookeeper/
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  4. 4. 2 ● Kafka has gotten its mileage out of Zookeeper ● But it is still a second system ● KIP-500 has been adopted by the community ● This is not a 1-1 replacement ● We’ve been headed this direction for years Introduction
  5. 5. 3 Evolution of Apache Kafka Clients
  6. 6. 4 Producer Consumer Admin Tools
  7. 7. 5 write to topics Producer Consumer Admin Tools
  8. 8. 6 write to topics read from topics Producer Consumer Admin Tools
  9. 9. 7 write to topics read from topics offset fetch/commit group partition assignment Producer Consumer Admin Tools
  10. 10. 8 write to topics read from topics offset fetch/commit group partition assignment topic create/delete Producer Consumer Admin Tools
  11. 11. 9 Consumer Group Coordinator
  12. 12. 10 Consumer offset fetch/commit group partition assignment read from topics
  13. 13. 11 Consumer offset fetch/commit group partition assignment read from topics Consumer APIs ● Fetch
  14. 14. 12 Consumer offset fetch/commit group partition assignment read from topics Consumer APIs ● Fetch
  15. 15. 13 Consumer Consumer APIs ● Fetchoffset fetch/commit group partition assignment read from topics __offsets
  16. 16. 14 offset fetch/commit Consumer group partition assignment read from topics Consumer APIs ● Fetch ● OffsetCommit ● OffsetFetch __offsets
  17. 17. 15 Consumer group partition assignment read from topics offset fetch/commit Consumer APIs ● Fetch ● OffsetCommit ● OffsetFetch __offsets
  18. 18. 16 Consumer group partition assignment read from topics offset fetch/commit Consumer APIs ● Fetch ● OffsetCommit ● OffsetFetch __offsets
  19. 19. 17 group partition assignment Consumer read from topics offset fetch/commit Consumer APIs ● Fetch ● OffsetCommit ● OffsetFetch ● JoinGroup ● SyncGroup ● Heartbeat __offsets
  20. 20. 18 Consumer read from topics offset fetch/commit group partition assignment Consumer APIs ● Fetch ● OffsetCommit ● OffsetFetch ● JoinGroup ● SyncGroup ● Heartbeat __offsets
  21. 21. 19 Consumer read from topics offset fetch/commit group partition assignment Consumer APIs ● Fetch ● OffsetCommit ● OffsetFetch ● JoinGroup ● SyncGroup ● Heartbeat __offsets
  22. 22. 20 read from topics offset fetch/commit group partition assignment Consumer Consumer APIs ● Fetch ● OffsetCommit ● OffsetFetch ● JoinGroup ● SyncGroup ● Heartbeat __offsets
  23. 23. 21 Consumer Producer Admin Tools create/delete topics
  24. 24. 22 Kafka Security and the Admin Client
  25. 25. 23 Consumer Producer create/delete topics Admin Tools
  26. 26. 24 ACL Enforcement create/delete topics Admin Tools Consumer Producer
  27. 27. 25 create/delete topics ACL Enforcement Admin Tools Consumer Producer
  28. 28. 26 create/delete topics ACL Enforcement Admin Tools
  29. 29. 27 AdminClient Admin Tools ACL Enforcement create/delete topics
  30. 30. 28 AdminClient Admin Tools ACL Enforcement create/delete topics Admin APIs: ● CreateTopics ● DeleteTopics ● AlterConfigs ● ...
  31. 31. 29 Admin APIs: ● CreateTopics ● DeleteTopics ● AlterConfigs ● ... AdminClient Admin Tools ACL Enforcement
  32. 32. 30 Producer Consumer AdminClient Client APIs: ● Produce ● Fetch ● Metadata ● CreateTopics ● DeleteTopics ● ...
  33. 33. 31 Producer Consumer AdminClient Client APIs: ● Produce ● Fetch ● Metadata ● CreateTopics ● DeleteTopics ● ... ● Encapsulation ● Security ● Validation ● Compatibility
  34. 34. 32 Inter Broker Communication
  35. 35. 33
  36. 36. 34 Broker Registration ACL Management Dynamic Configuration ISR Management
  37. 37. 35 Controller Broker Registration ACL Management Dynamic Configuration ISR Management
  38. 38. 36 Controller Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election
  39. 39. 37 Controller Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election
  40. 40. 38 Controller Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election
  41. 41. 39 Controller Controller APIs: ● LeaderAndIsr ● UpdateMetadata ● StopReplica Leader/ISR Push Update Metadata Stop/Delete Replica Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election
  42. 42. 40 Controller Controller APIs: ● LeaderAndIsr ● UpdateMetadata ● StopReplica Leader/ISR Push Update Metadata Stop/Delete Replica Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election
  43. 43. 41 Controller Controller APIs: ● LeaderAndIsr ● UpdateMetadata ● StopReplica Leader/ISR Push Update Metadata Stop/Delete Replica Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election
  44. 44. 42 Controller Controller APIs: ● LeaderAndIsr ● UpdateMetadata ● StopReplica ● AlterIsr Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election Leader/ISR Push Update Metadata Stop/Delete Replica
  45. 45. 43 Controller Leader/ISR Push Update Metadata Stop/Delete Replica ISR Management Controller APIs: ● LeaderAndIsr ● UpdateMetadata ● StopReplica ● AlterIsr Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election
  46. 46. 44
  47. 47. 45 ● Encapsulation ● Compatibility ● Ownership
  48. 48. 46 Broker Liveness
  49. 49. 47 Zk Session
  50. 50. 48 /brokers/1 -> { host: 10.10.10.1:9092 rack: rack-1 }
  51. 51. 49 /brokers/1 -> { host: 10.10.10.1:9092 rack: rack-1 }
  52. 52. 50
  53. 53. 51 Watch trigger
  54. 54. 52 Watch trigger Broker 1 is offline
  55. 55. 53 Network Partition Resilience
  56. 56. 54
  57. 57. 55 Case 1: Total partition
  58. 58. 56 Case 2: Broker partition
  59. 59. 57 Case 3: Zk Partition
  60. 60. 58 Case 4: Controller partition
  61. 61. 59 Metadata Inconsistency
  62. 62. 60
  63. 63. 61 Metadata Source of Truth
  64. 64. 62 Metadata Source of Truth Metadata Cache - sync writes - async updates
  65. 65. 63 Metadata Source of Truth Metadata Cache - async update Metadata Cache - sync writes - async updates Metadata Cache - async update
  66. 66. 64
  67. 67. 65
  68. 68. 66
  69. 69. 67 Last Resort: > rmr /controller
  70. 70. 68 Last Resort: > rmr /controller New controller!
  71. 71. 69 Last Resort: > rmr /controller Load ALL Metadata
  72. 72. 70 Last Resort: > rmr /controller Load ALL Metadata
  73. 73. 71 Last Resort: > rmr /controller Push ALL Metadata
  74. 74. 72 Last Resort: > rmr /controller Push ALL Metadata
  75. 75. 73 Last Resort: > rmr /controller Push ALL Metadata How do you know the metadata has diverged?
  76. 76. 74 Performance of Controller Initialization
  77. 77. 75
  78. 78. 76
  79. 79. 77 New controller!
  80. 80. 78 Load ALL Metadata
  81. 81. 79 Load ALL Metadata Complexity: O(N) N = number of partitions
  82. 82. 80
  83. 83. 81 Push ALL Metadata
  84. 84. 82 Push ALL Metadata Complexity: O(N*M) N = number of partitions M = number of brokers
  85. 85. 83 Metadata as an Event Log
  86. 86. 8484 Metadata as an Event Log - Each change becomes a message - Changes are propagated to all brokers ... 924 Create topic ”foo” 925 Delete topic “bar” 926 Add node 4 to the cluster 927 Create topic “baz” 928 Alter ISR for “foo-0” 929 Add node 5 to the cluster
  87. 87. 8585 Metadata as an Event Log - Clear ordering - Can send deltas - Offset tracks consumer position - Easy to measure lag ... 924 Create topic ”foo” 925 Delete topic “bar” 926 Add node 4 to the cluster 927 Create topic “baz” 928 Alter ISR for “foo-0” 929 Add node 5 to the cluster
  88. 88. 86 Consumer Consumer Consumer offset=3 offset=1 offset=2
  89. 89. 87 offset=3 offset=1 offset=2 Broker Broker Broker ?
  90. 90. 88 offset=3 offset=1 offset=2 Broker Broker Broker Controller
  91. 91. 89 Can we use the existing Kafka log replication protocol? - How do we elect the leader? We need a self-managed quorum. Implementing the Controller Log
  92. 92. 90 Can we use the existing Kafka log replication protocol? - How do we elect the leader? We need a self-managed quorum. Implementing the Controller Log Enter Raft. Leader election is by simple majority.
  93. 93. 91 Kafka Raft Writes Single Leader Single Leader Fencing Monotonically increasing epoch Monotonically increasing term Log reconciliation Offset and epoch Term and index Push/Pull Pull Push Commit Semantics ISR Majority Leader Election From ISR through Zookeeper Majority
  94. 94. 92 The Controller Quorum
  95. 95. 93 offset=1 offset=2 Broker Broker Controller Controller Controller The Controller Raft Quorum - The leader is the active controller - Controls reads / writes to the log - Typically 3 or 5 nodes, like ZK
  96. 96. 94 offset=1 offset=2 Broker Broker Controller Controller Controller Instant Failover - Low-latency failover via Raft election - Standbys contain all data in memory - Brokers do not need to re-fetch
  97. 97. 95 /mnt/logs/kafka/metadata offset=1 Broker Broker Controller Controller Controller Metadata Caching - Brokers can persist metadata to disk - Only fetch what they need - Use snapshots if we’re too far behind /mnt/logs/kafka/metadata offset=2
  98. 98. 96 Broker Registration - Building a map of the cluster - What brokers exist in the cluster? - How can they be reached? Controller
  99. 99. 97 Broker Registration - Brokers send heartbeats to the active controller - The controller uses this to build a map of the cluster Controller
  100. 100. 98 Controller Broker Registration - Brokers send heartbeats to the active controller - The controller uses this to build a map of the cluster - The controller also tells brokers if they should be fenced or shut down
  101. 101. 99 Controller Fencing - Brokers need to be fenced if they’re partitioned from the controller, or can’t keep up - Brokers self-fence if they can’t talk to the controller
  102. 102. 100 Handling network partitions
  103. 103. 101 Case 1: Total partition
  104. 104. 102 Case 1: Total partition
  105. 105. 103 Case 2: Broker partition
  106. 106. 104 Case 3: Controller partition
  107. 107. 105 Case 3: Controller partition
  108. 108. 106 Deployment Current KIP-500 Configuration File Kafka and ZooKeeper Kafka Metrics Kafka and ZK Kafka Administrative Tools ZK Shell, Four letter words, Kafka tools Kafka tools Security Kafka and ZK Kafka
  109. 109. 107 Shared Controller Nodes - Fewer resources used - Single node clusters (eventually)
  110. 110. 108 Separate Controller Nodes - Better resource isolation - Good for big clusters
  111. 111. 109 Roadmap
  112. 112. 110 Remove Client-side ZK dependencies Remove Broker-side ZK dependencies Controller Quorum
  113. 113. 111 Remove Client-side ZK dependencies Remove Broker-side ZK dependencies Controller Quorum Incremental KIP-4 Improvements - Create new APIs - Deprecate direct ZK access
  114. 114. 112 Remove Client-side ZK dependencies Remove Broker-side ZK dependencies Controller Quorum Broker-Side Fixes - Remove deprecated direct ZK access for tools - Create broker-side APIs - Centralize ZK access in the controller
  115. 115. 113 Remove Client-side ZK dependencies Remove Broker-side ZK dependencies Controller Quorum First Release without ZooKeeper - Raft - Controller quorum
  116. 116. 114 Upgrade Issues - Tools using ZK - Brokers accessing ZK - State in ZK KIP-500 Release Older Kafka Release
  117. 117. 115 Bridge Release KIP-500 Release Older Kafka Release Bridge Release - No ZK access from tools, brokers (except controller)
  118. 118. 116 Upgrading - Starting from the bridge release
  119. 119. 117 Upgrading - Start new controller nodes (possibly combined) - Quorum elects leader - Claims leadership in ZK
  120. 120. 118 Upgrading - Roll nodes one by one as usual - Controller continues sending LeaderAndIsr, etc. to old nodes
  121. 121. 119 Upgrading - When all brokers have been rolled, decommission ZK nodes
  122. 122. 120 Conclusion
  123. 123. 121 Apache ZooKeeper has served us well - KIP-500 is not a 1:1 replacement, but a different paradigm We have already started removing ZK from clients - Consumer, AdminClient - Improved encapsulation, security, upgradability
  124. 124. 122 Metadata should be managed as a log - Deltas, ordering, caching - Controller Failover, Fencing - Improved scalability, robustness, easier deployment The metadata log must be self-managed - Raft - Controller quorum
  125. 125. 123 It will take a few releases to implement KIP-500 - Additional KIPs for APIs, Raft, Metadata, etc. Rolling upgrades will be supported - Bridge release - Post-ZK release Kafka needs no Keeper
  126. 126. 124 cnfl.io/meetups cnfl.io/blog cnfl.io/slack THANK YOU Colin McCabe cmccabe@confluent.io
  127. 127. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ kafka-zookeeper/

×