SlideShare a Scribd company logo
1 of 59
Download to read offline
 lessons from managing a pulsar cluster
● Senior Developer at
Nutanix responsible for all
things pulsar
● Love spending time with
data (stores, streams,
analytics etc)
● Ex-MySQL - started out
with 3 great years building
MySQL Replication
● Contributions to pulsar &
MySQL
Who am I ?
https://www.linkedin.com/in/shivjijha/
https://twitter.com/ShivjiJha
● Helping customers
manage cost and security
for hybrid cloud.
● Crunch (& stream) data to
find insights about cost
and security
● Needed pub/sub to store
events and replay when
required
What do we do ?
https://www.nutanix.com/products/beam
Platforms We Use
Platforms We Use
How do we
Choose
a platform ??
Avoid bias
towards
familiar
technology
The First Steps
Summarising the github comment
1. Kafka alternative - incubating apache project PULSAR
2. Open sourced by Yahoo
3. Hundreds of billions of messages per day in pulsar at Yahoo
4. Solving annoying problems in kafka like:
a. Topic management
b. Disruptive rebalances
5. Same raw power (throughput, latencies etc)
6. Stateless brokers
7. Apache bookkeeper for storage
8. Stream + queue
Wow, that is
a lot of
Promise!!
First principles - Requirements?
1. Coordination
2. Persistence
3. Scale compute and storage independently
4. High Availability
5. Fault tolerance
6. Client ecosystem
Requirement # 1
✓ Coordination
Requirement # 1
✓ Coordination
Requirement # 1
1. Coordination
Requirement # 2
✓ Persistence
Requirement # 2
✓ Persistence
Requirement # 2
✓ Persistence
Requirement # 2
✓ Persistence
Requirement # 3
✓ Scale compute and storage independently
Requirement # 3
✓ Scale compute and storage independently
Requirement # 3
✓ Scale compute and storage independently
Brokers => serve msg
Requirement # 3
✓ Scale compute and storage independently
Bookies => store
Brokers => serve msg
Requirement # 3
✓ Scale compute and storage independently
Bookies => store
Brokers => serve msg
Requirement # 4
✓ High Availability
Requirement # 4
✓ High Availability
Requirement # 4
✓ High Availability
Replicated brokers
Replicated bookies
Requirement # 4
✓ High Availability
Replicated brokers
Replicated bookies
Requirement # 5
✓ Fault tolerance
✓ Replicated compute (brokers)
✓ Replicated store (bookkeeper / bookies)
Requirement # 5
✓ Fault tolerance
✓ Replicated compute (brokers)
✓ Replicated store (bookkeeper / bookies)
✓ Tunable fault tolerance (bookkeeper)
✓ Ensemble size
✓ Write quorum size
✓ Ack quorum size
https://www.splunk.com/en_us/blog/it/why-apache-bookkeeper-part-1-consistency-durability-availability.html
Requirement # 5
✓ Fault tolerance
✓ Replicated compute (brokers)
✓ Replicated store (bookkeeper / bookies)
✓ Tunable fault tolerance (bookkeeper)
✓ Ensemble size
✓ Write quorum size
✓ Ack quorum size
https://www.splunk.com/en_us/blog/it/why-apache-bookkeeper-part-1-consistency-durability-availability.html
set-persistence --ensemble 5 --writeQuorum 3 --ackQuorum 2
Requirement # 5
✓ Fault tolerance
✓ Replicated compute (brokers)
✓ Replicated store (bookkeeper / bookies)
✓ Tunable fault tolerance (bookkeeper)
✓ Ensemble size
✓ Write quorum size
✓ Ack quorum size
https://www.splunk.com/en_us/blog/it/why-apache-bookkeeper-part-1-consistency-durability-availability.html
When scaling
bookie cluster,
finetune quorum
sizes
set-persistence --ensemble 5 --writeQuorum 3 --ackQuorum 2
Requirement # 5
✓ Fault tolerance
✓ Replicated compute (brokers)
✓ Replicated store (bookkeeper / bookies)
✓ Tunable fault tolerance (bookkeeper)
✓ Ensemble size
✓ Write quorum size
✓ Ack quorum size
https://www.splunk.com/en_us/blog/it/why-apache-bookkeeper-part-1-consistency-durability-availability.html
When scaling
bookie cluster,
finetune quorum
sizes
set-persistence --ensemble 5 --writeQuorum 3 --ackQuorum 2
Requirement # 6
✓ Client ecosystem
✓ Work in progress
Requirement # 6
✓ Client ecosystem
✓ Work in progress
✓ Compensating factors:
✓ Clients are easier to change, just a library afterall!
✓ Very active community (slack)
✓ Quick turnaround (and quick fixes) for critical issues
Requirement # 6
✓ Client ecosystem
✓ Work in progress
✓ Compensating factors:
✓ Clients are easier to change, just a library afterall!
✓ Very active community (slack)
✓ Quick turnaround (and quick fixes) for critical issues
✓ Bonus features
✓ Load balancer auto balances topics among brokers
✓ Tiered storage
✓ Unified platform (Stream + Queue)
✓ Multi-tenant topic structure
Requirement # 6
✓ Client ecosystem
✓ Work in progress
✓ Compensating factors:
✓ Clients are easier to change, just a library afterall!
✓ Very active community (slack)
✓ Quick turnaround (and quick fixes) for critical issues
✓ Bonus features
✓ Load balancer auto balances topics among brokers
✓ Tiered storage
✓ Unified platform (Stream + Queue)
✓ Multi-tenant topic structure
Tuning Configurations
✓ Configurations could be optimized for backward compatibility
✓ Not necessarily for performance
✓ Not necessarily for latest features
✓ Perf Test for your use cases and tune!
Performance Testing
Pulsar with
https://locust.io/
Test Sync
Message
Test Async
Message
Tuning Configurations
✓ Durability vs throughput (bookkeeper.conf)
# Maximum latency to impose on a journal write to achieve grouping
journalMaxGroupWaitMSec=2
Tuning Configurations
✓ Disable auto recovery in bookkeeper when out for maintenance!
bookkeeper shell autorecovery -disable
STOP / MAINTENANCE / START
bookkeeper shell autorecovery -enable
Tuning Configurations
✓ Auto recovery vs throughput (broker.conf)
✓ If you have a small number of bookies, and a bookie goes down, auto recovery
may overwhelm the remaining bookies
✓ Number of entries that a replication will re-replicate in parallel
maxPendingReadRequestsPerThread=2500
rereplicationEntryBatchSize=100
Contribute to stay in sync
1. Development is fast, in fact very fast
a. Don’t maintain forks, easier to contribute
https://github.com/apache/pulsar/graphs/contributors
Contribute to stay in sync
1. Development is fast, in fact very fast
a. Don’t maintain forks, easier to contribute
2. We do the same!
https://github.com/apache/pulsar/graphs/contributors
Pulsar Use cases In Beam
&
Event Sourcing
1. Persisting your application's state by storing the history that
determines the current state of your application.
State of application at
any point in time
State of application at
this instant of time
https://docs.microsoft.com/en-us/previous-versions/msp-n-p/jj591559(v=pandp.10)
● History of events
● Past Tense verbs
● Immutable
● Ordered
● Restore for state at any
point in time
● Use: CQRS, Audit trail etc
Event Sourcing
https://docs.microsoft.com/en-us/azure/architecture/patterns/event-sourcing
Representing Events (Schema)
1. Pulsar supports bytes, string, avro, ptobuff, json etc
2. Schemaless?
a. Any code that manipulates the data needs to make some assumptions about its
structure.
b. All producers and consumers know the hidden implicit schema.
3. Opinion: Use schema as far as possible.
a. Pulsar supports schema registry out of the box.
Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Add custom
fields for UI etc
Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Add custom
fields for UI etc
Different attributes
depending on kind
of event
Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Add custom
fields for UI etc
Different attributes
depending on kind
of event
Obviously, easy for
schemaless,
still needs care!
What to put on ONE topic?
1. Two choices:
a. Topic == collection of events of same type
b. Topic == events that need relative ordering guarantee.
https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
What to put on ONE topic?
1. Two choices:
a. Topic == collection of events of same type
b. Topic == events that need relative ordering guarantee.
2. Winner: choice (b)
https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
Avro / Proto (Struct) Schema
1. Language agnostic schema. Being stuck with one language sucks!
2. JSON seems first pick if you use REST, but
a. slow and
b. too verbose.
c. Complete Schema shipped with every message
3. Avro and proto are good.
4. We like Avro for its wide adoption.
a. And use pulsar’s built in schema registry
5. Consider keeping schema flat and fat (denormalize)!
https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
Schema Evolution
1. Choose a schema-auto-update strategy that suits use case.
a. We keep it forward compatible (add fields, delete optional fields)
b. Data produced with new schema can be read by consumers using last schema
c. Update producer, then consumers when they have time / need.
2. Each avro message contains an avro schema id & version.
3. Decode with the exact writer schema.
Summarizing Lessons
✓ Avoid bias to “known” when choosing a platform.
✓ Tune re-replication (ensemble, write quorum, ack quorum) when
scaling out bookies horizontally.
✓ Use schema, as far as possible!
✓ Tune configuration for size, resource, throughput, durability etc.
May be optimized for backward compatibility.
✓ Disable auto-recovery of bookie before taking down.
✓ Balance recovery with incoming user traffic.
✓ Put events that require ordering on same topic.
Stay Connected:
● Pulsar Mailing Lists
○ users@pulsar.apache.org
○ dev@pulsar.apache.org
● Pulsar Slack
○ https://apache-pulsar.slack.com
● You can contact me at:
○ https://twitter.com/ShivjiJha
○ https://www.linkedin.com/in/shivjijha/
Q & A Time

More Related Content

What's hot

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...StreamNative
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIALa Cuisine du Web
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Knoldus Inc.
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Transaction preview of Apache Pulsar
Transaction preview of Apache PulsarTransaction preview of Apache Pulsar
Transaction preview of Apache PulsarStreamNative
 
When apache pulsar meets apache flink
When apache pulsar meets apache flinkWhen apache pulsar meets apache flink
When apache pulsar meets apache flinkStreamNative
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemStreamNative
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 

What's hot (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIA
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Transaction preview of Apache Pulsar
Transaction preview of Apache PulsarTransaction preview of Apache Pulsar
Transaction preview of Apache Pulsar
 
Kafka
KafkaKafka
Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
When apache pulsar meets apache flink
When apache pulsar meets apache flinkWhen apache pulsar meets apache flink
When apache pulsar meets apache flink
 
Kafka on Pulsar
Kafka on Pulsar Kafka on Pulsar
Kafka on Pulsar
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data Ecosystem
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 

Similar to lessons from managing a pulsar cluster

Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)StreamNative
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...HostedbyConfluent
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasDataWorks Summit
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent
 
thinking in key value stores
thinking in key value storesthinking in key value stores
thinking in key value storesBhasker Kode
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...C4Media
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 

Similar to lessons from managing a pulsar cluster (20)

Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
reBuy on Kubernetes
reBuy on KubernetesreBuy on Kubernetes
reBuy on Kubernetes
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache Cassandra
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Event driven-arch
Event driven-archEvent driven-arch
Event driven-arch
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
thinking in key value stores
thinking in key value storesthinking in key value stores
thinking in key value stores
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 

More from Shivji Kumar Jha

Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesShivji Kumar Jha
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesShivji Kumar Jha
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxShivji Kumar Jha
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Shivji Kumar Jha
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarShivji Kumar Jha
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationShivji Kumar Jha
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreShivji Kumar Jha
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationShivji Kumar Jha
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesShivji Kumar Jha
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityShivji Kumar Jha
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterShivji Kumar Jha
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationShivji Kumar Jha
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesShivji Kumar Jha
 

More from Shivji Kumar Jha (16)

Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptx
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event Store
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group Replication
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New Features
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and Scalability
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL Cluster
 
MySQL User Camp: GTIDs
MySQL User Camp: GTIDsMySQL User Camp: GTIDs
MySQL User Camp: GTIDs
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source Replication
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded Slaves
 

Recently uploaded

Phase noise transfer functions.pptx
Phase noise transfer      functions.pptxPhase noise transfer      functions.pptx
Phase noise transfer functions.pptxSaiGouthamSunkara
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxSAJITHABANUS
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxHome
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Apollo Techno Industries Pvt Ltd
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....santhyamuthu1
 
nvidia AI-gtc 2024 partial slide deck.pptx
nvidia AI-gtc 2024 partial slide deck.pptxnvidia AI-gtc 2024 partial slide deck.pptx
nvidia AI-gtc 2024 partial slide deck.pptxjasonsedano2
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecTrupti Shiralkar, CISSP
 
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxVertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxLMW Machine Tool Division
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Projectreemakb03
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical SensorTanvir Moin
 
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid BodyAhmadHajasad2
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
Design of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptxDesign of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptxYogeshKumarKJMIT
 
Landsman converter for power factor improvement
Landsman converter for power factor improvementLandsman converter for power factor improvement
Landsman converter for power factor improvementVijayMuni2
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesDIPIKA83
 
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfRenewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfodunowoeminence2019
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...sahb78428
 

Recently uploaded (20)

Phase noise transfer functions.pptx
Phase noise transfer      functions.pptxPhase noise transfer      functions.pptx
Phase noise transfer functions.pptx
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptx
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
 
nvidia AI-gtc 2024 partial slide deck.pptx
nvidia AI-gtc 2024 partial slide deck.pptxnvidia AI-gtc 2024 partial slide deck.pptx
nvidia AI-gtc 2024 partial slide deck.pptx
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
 
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxVertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Project
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical Sensor
 
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
Design of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptxDesign of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptx
 
Landsman converter for power factor improvement
Landsman converter for power factor improvementLandsman converter for power factor improvement
Landsman converter for power factor improvement
 
計劃趕得上變化
計劃趕得上變化計劃趕得上變化
計劃趕得上變化
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display Devices
 
Lecture 2 .pptx
Lecture 2                            .pptxLecture 2                            .pptx
Lecture 2 .pptx
 
Présentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdfPrésentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdf
 
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfRenewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...
 

lessons from managing a pulsar cluster

  • 2. ● Senior Developer at Nutanix responsible for all things pulsar ● Love spending time with data (stores, streams, analytics etc) ● Ex-MySQL - started out with 3 great years building MySQL Replication ● Contributions to pulsar & MySQL Who am I ? https://www.linkedin.com/in/shivjijha/ https://twitter.com/ShivjiJha
  • 3. ● Helping customers manage cost and security for hybrid cloud. ● Crunch (& stream) data to find insights about cost and security ● Needed pub/sub to store events and replay when required What do we do ? https://www.nutanix.com/products/beam
  • 6. How do we Choose a platform ??
  • 9. Summarising the github comment 1. Kafka alternative - incubating apache project PULSAR 2. Open sourced by Yahoo 3. Hundreds of billions of messages per day in pulsar at Yahoo 4. Solving annoying problems in kafka like: a. Topic management b. Disruptive rebalances 5. Same raw power (throughput, latencies etc) 6. Stateless brokers 7. Apache bookkeeper for storage 8. Stream + queue
  • 10. Wow, that is a lot of Promise!!
  • 11. First principles - Requirements? 1. Coordination 2. Persistence 3. Scale compute and storage independently 4. High Availability 5. Fault tolerance 6. Client ecosystem
  • 12. Requirement # 1 ✓ Coordination
  • 13. Requirement # 1 ✓ Coordination
  • 14. Requirement # 1 1. Coordination
  • 15. Requirement # 2 ✓ Persistence
  • 16. Requirement # 2 ✓ Persistence
  • 17. Requirement # 2 ✓ Persistence
  • 18. Requirement # 2 ✓ Persistence
  • 19. Requirement # 3 ✓ Scale compute and storage independently
  • 20. Requirement # 3 ✓ Scale compute and storage independently
  • 21. Requirement # 3 ✓ Scale compute and storage independently Brokers => serve msg
  • 22. Requirement # 3 ✓ Scale compute and storage independently Bookies => store Brokers => serve msg
  • 23. Requirement # 3 ✓ Scale compute and storage independently Bookies => store Brokers => serve msg
  • 24. Requirement # 4 ✓ High Availability
  • 25. Requirement # 4 ✓ High Availability
  • 26. Requirement # 4 ✓ High Availability Replicated brokers Replicated bookies
  • 27. Requirement # 4 ✓ High Availability Replicated brokers Replicated bookies
  • 28. Requirement # 5 ✓ Fault tolerance ✓ Replicated compute (brokers) ✓ Replicated store (bookkeeper / bookies)
  • 29. Requirement # 5 ✓ Fault tolerance ✓ Replicated compute (brokers) ✓ Replicated store (bookkeeper / bookies) ✓ Tunable fault tolerance (bookkeeper) ✓ Ensemble size ✓ Write quorum size ✓ Ack quorum size https://www.splunk.com/en_us/blog/it/why-apache-bookkeeper-part-1-consistency-durability-availability.html
  • 30. Requirement # 5 ✓ Fault tolerance ✓ Replicated compute (brokers) ✓ Replicated store (bookkeeper / bookies) ✓ Tunable fault tolerance (bookkeeper) ✓ Ensemble size ✓ Write quorum size ✓ Ack quorum size https://www.splunk.com/en_us/blog/it/why-apache-bookkeeper-part-1-consistency-durability-availability.html set-persistence --ensemble 5 --writeQuorum 3 --ackQuorum 2
  • 31. Requirement # 5 ✓ Fault tolerance ✓ Replicated compute (brokers) ✓ Replicated store (bookkeeper / bookies) ✓ Tunable fault tolerance (bookkeeper) ✓ Ensemble size ✓ Write quorum size ✓ Ack quorum size https://www.splunk.com/en_us/blog/it/why-apache-bookkeeper-part-1-consistency-durability-availability.html When scaling bookie cluster, finetune quorum sizes set-persistence --ensemble 5 --writeQuorum 3 --ackQuorum 2
  • 32. Requirement # 5 ✓ Fault tolerance ✓ Replicated compute (brokers) ✓ Replicated store (bookkeeper / bookies) ✓ Tunable fault tolerance (bookkeeper) ✓ Ensemble size ✓ Write quorum size ✓ Ack quorum size https://www.splunk.com/en_us/blog/it/why-apache-bookkeeper-part-1-consistency-durability-availability.html When scaling bookie cluster, finetune quorum sizes set-persistence --ensemble 5 --writeQuorum 3 --ackQuorum 2
  • 33. Requirement # 6 ✓ Client ecosystem ✓ Work in progress
  • 34. Requirement # 6 ✓ Client ecosystem ✓ Work in progress ✓ Compensating factors: ✓ Clients are easier to change, just a library afterall! ✓ Very active community (slack) ✓ Quick turnaround (and quick fixes) for critical issues
  • 35. Requirement # 6 ✓ Client ecosystem ✓ Work in progress ✓ Compensating factors: ✓ Clients are easier to change, just a library afterall! ✓ Very active community (slack) ✓ Quick turnaround (and quick fixes) for critical issues ✓ Bonus features ✓ Load balancer auto balances topics among brokers ✓ Tiered storage ✓ Unified platform (Stream + Queue) ✓ Multi-tenant topic structure
  • 36. Requirement # 6 ✓ Client ecosystem ✓ Work in progress ✓ Compensating factors: ✓ Clients are easier to change, just a library afterall! ✓ Very active community (slack) ✓ Quick turnaround (and quick fixes) for critical issues ✓ Bonus features ✓ Load balancer auto balances topics among brokers ✓ Tiered storage ✓ Unified platform (Stream + Queue) ✓ Multi-tenant topic structure
  • 37. Tuning Configurations ✓ Configurations could be optimized for backward compatibility ✓ Not necessarily for performance ✓ Not necessarily for latest features ✓ Perf Test for your use cases and tune!
  • 41. Tuning Configurations ✓ Durability vs throughput (bookkeeper.conf) # Maximum latency to impose on a journal write to achieve grouping journalMaxGroupWaitMSec=2
  • 42. Tuning Configurations ✓ Disable auto recovery in bookkeeper when out for maintenance! bookkeeper shell autorecovery -disable STOP / MAINTENANCE / START bookkeeper shell autorecovery -enable
  • 43. Tuning Configurations ✓ Auto recovery vs throughput (broker.conf) ✓ If you have a small number of bookies, and a bookie goes down, auto recovery may overwhelm the remaining bookies ✓ Number of entries that a replication will re-replicate in parallel maxPendingReadRequestsPerThread=2500 rereplicationEntryBatchSize=100
  • 44. Contribute to stay in sync 1. Development is fast, in fact very fast a. Don’t maintain forks, easier to contribute https://github.com/apache/pulsar/graphs/contributors
  • 45. Contribute to stay in sync 1. Development is fast, in fact very fast a. Don’t maintain forks, easier to contribute 2. We do the same! https://github.com/apache/pulsar/graphs/contributors
  • 46. Pulsar Use cases In Beam &
  • 47. Event Sourcing 1. Persisting your application's state by storing the history that determines the current state of your application. State of application at any point in time State of application at this instant of time https://docs.microsoft.com/en-us/previous-versions/msp-n-p/jj591559(v=pandp.10)
  • 48. ● History of events ● Past Tense verbs ● Immutable ● Ordered ● Restore for state at any point in time ● Use: CQRS, Audit trail etc Event Sourcing https://docs.microsoft.com/en-us/azure/architecture/patterns/event-sourcing
  • 49. Representing Events (Schema) 1. Pulsar supports bytes, string, avro, ptobuff, json etc 2. Schemaless? a. Any code that manipulates the data needs to make some assumptions about its structure. b. All producers and consumers know the hidden implicit schema. 3. Opinion: Use schema as far as possible. a. Pulsar supports schema registry out of the box.
  • 50. Representing Events (Schema) 1. Of course, Schemalessness offers a pragmatic alternative at times. https://martinfowler.com/articles/schemaless/#non-uniform-types
  • 51. Representing Events (Schema) 1. Of course, Schemalessness offers a pragmatic alternative at times. https://martinfowler.com/articles/schemaless/#non-uniform-types Add custom fields for UI etc
  • 52. Representing Events (Schema) 1. Of course, Schemalessness offers a pragmatic alternative at times. https://martinfowler.com/articles/schemaless/#non-uniform-types Add custom fields for UI etc Different attributes depending on kind of event
  • 53. Representing Events (Schema) 1. Of course, Schemalessness offers a pragmatic alternative at times. https://martinfowler.com/articles/schemaless/#non-uniform-types Add custom fields for UI etc Different attributes depending on kind of event Obviously, easy for schemaless, still needs care!
  • 54. What to put on ONE topic? 1. Two choices: a. Topic == collection of events of same type b. Topic == events that need relative ordering guarantee. https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
  • 55. What to put on ONE topic? 1. Two choices: a. Topic == collection of events of same type b. Topic == events that need relative ordering guarantee. 2. Winner: choice (b) https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
  • 56. Avro / Proto (Struct) Schema 1. Language agnostic schema. Being stuck with one language sucks! 2. JSON seems first pick if you use REST, but a. slow and b. too verbose. c. Complete Schema shipped with every message 3. Avro and proto are good. 4. We like Avro for its wide adoption. a. And use pulsar’s built in schema registry 5. Consider keeping schema flat and fat (denormalize)! https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
  • 57. Schema Evolution 1. Choose a schema-auto-update strategy that suits use case. a. We keep it forward compatible (add fields, delete optional fields) b. Data produced with new schema can be read by consumers using last schema c. Update producer, then consumers when they have time / need. 2. Each avro message contains an avro schema id & version. 3. Decode with the exact writer schema.
  • 58. Summarizing Lessons ✓ Avoid bias to “known” when choosing a platform. ✓ Tune re-replication (ensemble, write quorum, ack quorum) when scaling out bookies horizontally. ✓ Use schema, as far as possible! ✓ Tune configuration for size, resource, throughput, durability etc. May be optimized for backward compatibility. ✓ Disable auto-recovery of bookie before taking down. ✓ Balance recovery with incoming user traffic. ✓ Put events that require ordering on same topic.
  • 59. Stay Connected: ● Pulsar Mailing Lists ○ users@pulsar.apache.org ○ dev@pulsar.apache.org ● Pulsar Slack ○ https://apache-pulsar.slack.com ● You can contact me at: ○ https://twitter.com/ShivjiJha ○ https://www.linkedin.com/in/shivjijha/ Q & A Time