SlideShare a Scribd company logo
1 of 35
Download to read offline
When One Data Center is not Enough
Guozhang Wang Strata San Jose, 2016
Building large-scale stream infrastructure across multiple data centers with Apache Kafka
2
• Why across Data Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
3
Why across Data Centers?
4
Why across Data Centers
• Catastrophic / expected failures
• Routine maintenance
• Geo-locality (Example: CDNs)
5
Why NOT across Data Centers
• Low bandwidth (10Mbps - 1Gbps)
• High latency (50ms - 450ms)
• Much More $$$
6
Why NOT across Data Centers
• … is hard and expensive
7
Why NOT across Data Centers
• … is hard and expensive
• … with real-time writes? Harder
8
Why NOT across Data Centers
• … is hard and expensive
• … with real-time writes? Harder
• … consistently? Oh My!
9
Consistency
• Weak
• Eventual
• Strong
Latency Guarantee
10
Weak No Consistency
• Now you see my writes, now you don’t
• Best effort only, data can be stale
• Examples: think of “caches”, VoIP
11
Eventual Consistency
• You will see my writes, … eventually
• May need to resolve conflicts (manually)
• Examples: think of “emails”, SMTP
12
Strong Consistency
• You get what you write, for sure
• External > Sequential > Causal (Session)
• Examples: RDBMS, file systems
13
• LAN: consistency over latency
• WAN: latency over consistency
Latency vs. Consistency
14
• Why across Data Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
15
Option I: Don’t do it
• Bunkerize the single data center
• Expect data loss at failures
• Examples: ??
16
Option II: Primary with Hot Standby
• Failover to hot standby (maybe inconsistent)
• Window of data loss at failures
• Examples: MySQL binlog
17
Option III: Active-Active
• Accepts writes in multi-DC
• Resolve conflicts (strong / week consistency)
• Examples: Amazon DynamoDB (vector clock)
Google Spanner (2PC), Mesa (Paxos)
18
Ordering is the Key!
19
Ordering is Key
• Vector clocks: partial ordering
• Paxos, 2PC: global ordering
• Log shipping: logical ordering (per-partition)
21
Apache Kafka
• A distributed messaging system
..that store messages as a log!
22
Store Messages as a Log
4 5 5 7 8 9 10 11 12...
Producer Write
Consumer1 Reads
(offset 7)
Consumer2 Reads
(offset 10)
Messages
3
23
Partition the Log across Machines
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
24
ACK mode Latency On Failures
“no" no network delay some data loss
“leader" 1 network roundtrip a few data loss
“all" ~2 network roundtrips no data loss
Configurable ISR Commits
25
• Why across Data Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
26
Option I: Active-Passive Replication
Kafka
local
producers
consumer consumer
DC 1
MirrorMaker
DC 2
Kafka
replica
27
Option I: Active-Passive Replication
• Async- replication across DC
• May lose data on failover
• Example: ETL to data warehouse / HDFS
Kafka
local
producers
consumer consumer
DC 1
MirrorMaker
DC 2
Kafka
replica
28
Option II: Active-Active Replication
Kafka
local
Kafka
aggregate
Kafka
aggregate
producers producers
consumer consumer
MirrorMaker
Kafka
local
on DC1 failure
DC 1 DC 2
29
Option II: Active-Active Replication
• Global view on agg. cluster
• Require offsets to resume
• Example: store materialization, index updates
Kafka
local
Kafka
agg
Kafka
agg
producers producers
consumer consumer
MirrorMaker
Kafka
local
on DC1 failure
DC 1 DC 2
30
• Offsets not identical between Kafka clusters
• Duplicates during failover
• Partition selection may be different
• Solutions
• Resume from log end offset (suitable for real-time apps)
• Resume from a timestamp (ListOffsets, offset index: KIP-33)
Caveats: offsets across DCs
31
Option III: Deploy across DCs
Kafka
producers producers
consumer consumer
DC 1 DC 2
32
Option III: Deploy across DCs
• Multi-tenancy support
• Security (0.9)
• Quota Management (0.9)
• Latency optimization
• Rack-aware partition assignment (0.10)
• Read affinity (future?)
Kafka
producers producers
consumer consumer
DC 1 DC 2
33
• Same region: essentially same network
• asymmetric partitioning is rare, low latency
• Need at least 3 DCs for Zookeeper
• Reserved instance to reduce churns
• EIP for external clients, private IPs for internal communication
• Reserved instance, local storage
Example: EC2 multi-AZ Deployment
34
Take-aways
• Multi-DC: trade-off between latency and consistency
• Kafka: replicated log streams for multihoming
Thank you
Guozhang | guozhang@confluent.io | @guozhangwang
Meet Confluent in booth #838 

Confluent University ~ Kafka training ~ confluent.io/training
Join the Stream Data Hackathon Apr 25, SF

kafka-summit.org/hackathon/
Download Apache Kafka
& Confluent Platform
confluent.io/download

More Related Content

What's hot

What's hot (20)

Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
kafka
kafkakafka
kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Thoughts on kafka capacity planning
Thoughts on kafka capacity planningThoughts on kafka capacity planning
Thoughts on kafka capacity planning
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Cruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersCruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clusters
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafka
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at Uber
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 

Similar to Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Ontico
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 

Similar to Building Stream Infrastructure across Multiple Data Centers with Apache Kafka (20)

Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverston
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
 

More from Guozhang Wang

More from Guozhang Wang (14)

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of Kafka
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
 

Recently uploaded

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 

Recently uploaded (20)

Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 

Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

  • 1. When One Data Center is not Enough Guozhang Wang Strata San Jose, 2016 Building large-scale stream infrastructure across multiple data centers with Apache Kafka
  • 2. 2 • Why across Data Centers? • Design patterns for Multi-DC • Kafka for Multi-DC • Conclusion Agenda
  • 3. 3 Why across Data Centers?
  • 4. 4 Why across Data Centers • Catastrophic / expected failures • Routine maintenance • Geo-locality (Example: CDNs)
  • 5. 5 Why NOT across Data Centers • Low bandwidth (10Mbps - 1Gbps) • High latency (50ms - 450ms) • Much More $$$
  • 6. 6 Why NOT across Data Centers • … is hard and expensive
  • 7. 7 Why NOT across Data Centers • … is hard and expensive • … with real-time writes? Harder
  • 8. 8 Why NOT across Data Centers • … is hard and expensive • … with real-time writes? Harder • … consistently? Oh My!
  • 9. 9 Consistency • Weak • Eventual • Strong Latency Guarantee
  • 10. 10 Weak No Consistency • Now you see my writes, now you don’t • Best effort only, data can be stale • Examples: think of “caches”, VoIP
  • 11. 11 Eventual Consistency • You will see my writes, … eventually • May need to resolve conflicts (manually) • Examples: think of “emails”, SMTP
  • 12. 12 Strong Consistency • You get what you write, for sure • External > Sequential > Causal (Session) • Examples: RDBMS, file systems
  • 13. 13 • LAN: consistency over latency • WAN: latency over consistency Latency vs. Consistency
  • 14. 14 • Why across Data Centers? • Design patterns for Multi-DC • Kafka for Multi-DC • Conclusion Agenda
  • 15. 15 Option I: Don’t do it • Bunkerize the single data center • Expect data loss at failures • Examples: ??
  • 16. 16 Option II: Primary with Hot Standby • Failover to hot standby (maybe inconsistent) • Window of data loss at failures • Examples: MySQL binlog
  • 17. 17 Option III: Active-Active • Accepts writes in multi-DC • Resolve conflicts (strong / week consistency) • Examples: Amazon DynamoDB (vector clock) Google Spanner (2PC), Mesa (Paxos)
  • 19. 19 Ordering is Key • Vector clocks: partial ordering • Paxos, 2PC: global ordering • Log shipping: logical ordering (per-partition)
  • 20.
  • 21. 21 Apache Kafka • A distributed messaging system ..that store messages as a log!
  • 22. 22 Store Messages as a Log 4 5 5 7 8 9 10 11 12... Producer Write Consumer1 Reads (offset 7) Consumer2 Reads (offset 10) Messages 3
  • 23. 23 Partition the Log across Machines Topic 1 Topic 2 Partitions Producers Producers Consumers Consumers Brokers
  • 24. 24 ACK mode Latency On Failures “no" no network delay some data loss “leader" 1 network roundtrip a few data loss “all" ~2 network roundtrips no data loss Configurable ISR Commits
  • 25. 25 • Why across Data Centers? • Design patterns for Multi-DC • Kafka for Multi-DC • Conclusion Agenda
  • 26. 26 Option I: Active-Passive Replication Kafka local producers consumer consumer DC 1 MirrorMaker DC 2 Kafka replica
  • 27. 27 Option I: Active-Passive Replication • Async- replication across DC • May lose data on failover • Example: ETL to data warehouse / HDFS Kafka local producers consumer consumer DC 1 MirrorMaker DC 2 Kafka replica
  • 28. 28 Option II: Active-Active Replication Kafka local Kafka aggregate Kafka aggregate producers producers consumer consumer MirrorMaker Kafka local on DC1 failure DC 1 DC 2
  • 29. 29 Option II: Active-Active Replication • Global view on agg. cluster • Require offsets to resume • Example: store materialization, index updates Kafka local Kafka agg Kafka agg producers producers consumer consumer MirrorMaker Kafka local on DC1 failure DC 1 DC 2
  • 30. 30 • Offsets not identical between Kafka clusters • Duplicates during failover • Partition selection may be different • Solutions • Resume from log end offset (suitable for real-time apps) • Resume from a timestamp (ListOffsets, offset index: KIP-33) Caveats: offsets across DCs
  • 31. 31 Option III: Deploy across DCs Kafka producers producers consumer consumer DC 1 DC 2
  • 32. 32 Option III: Deploy across DCs • Multi-tenancy support • Security (0.9) • Quota Management (0.9) • Latency optimization • Rack-aware partition assignment (0.10) • Read affinity (future?) Kafka producers producers consumer consumer DC 1 DC 2
  • 33. 33 • Same region: essentially same network • asymmetric partitioning is rare, low latency • Need at least 3 DCs for Zookeeper • Reserved instance to reduce churns • EIP for external clients, private IPs for internal communication • Reserved instance, local storage Example: EC2 multi-AZ Deployment
  • 34. 34 Take-aways • Multi-DC: trade-off between latency and consistency • Kafka: replicated log streams for multihoming
  • 35. Thank you Guozhang | guozhang@confluent.io | @guozhangwang Meet Confluent in booth #838 
 Confluent University ~ Kafka training ~ confluent.io/training Join the Stream Data Hackathon Apr 25, SF
 kafka-summit.org/hackathon/ Download Apache Kafka & Confluent Platform confluent.io/download