Having used apache pulsar in production for an year for our pub sub use cases such as stream analytics, event sourcing etc, this slide deck presents the lesson learned per se understanding the architecture, tuning the cluster, managing to keep it highly available and fault tolerant and much more.
While the slides are presented in terms of apache pulsar, a lot of the concepts can be easily extended to a lot of distributed systems.
The views here are my own and do not represent the view of nutanix corporation.
Clutches and brkesSelect any 3 position random motion out of real world and d...
lessons from managing a pulsar cluster
2. ● Senior Developer at
Nutanix responsible for all
things pulsar
● Love spending time with
data (stores, streams,
analytics etc)
● Ex-MySQL - started out
with 3 great years building
MySQL Replication
● Contributions to pulsar &
MySQL
Who am I ?
https://www.linkedin.com/in/shivjijha/
https://twitter.com/ShivjiJha
3. ● Helping customers
manage cost and security
for hybrid cloud.
● Crunch (& stream) data to
find insights about cost
and security
● Needed pub/sub to store
events and replay when
required
What do we do ?
https://www.nutanix.com/products/beam
9. Summarising the github comment
1. Kafka alternative - incubating apache project PULSAR
2. Open sourced by Yahoo
3. Hundreds of billions of messages per day in pulsar at Yahoo
4. Solving annoying problems in kafka like:
a. Topic management
b. Disruptive rebalances
5. Same raw power (throughput, latencies etc)
6. Stateless brokers
7. Apache bookkeeper for storage
8. Stream + queue
34. Requirement # 6
✓ Client ecosystem
✓ Work in progress
✓ Compensating factors:
✓ Clients are easier to change, just a library afterall!
✓ Very active community (slack)
✓ Quick turnaround (and quick fixes) for critical issues
35. Requirement # 6
✓ Client ecosystem
✓ Work in progress
✓ Compensating factors:
✓ Clients are easier to change, just a library afterall!
✓ Very active community (slack)
✓ Quick turnaround (and quick fixes) for critical issues
✓ Bonus features
✓ Load balancer auto balances topics among brokers
✓ Tiered storage
✓ Unified platform (Stream + Queue)
✓ Multi-tenant topic structure
36. Requirement # 6
✓ Client ecosystem
✓ Work in progress
✓ Compensating factors:
✓ Clients are easier to change, just a library afterall!
✓ Very active community (slack)
✓ Quick turnaround (and quick fixes) for critical issues
✓ Bonus features
✓ Load balancer auto balances topics among brokers
✓ Tiered storage
✓ Unified platform (Stream + Queue)
✓ Multi-tenant topic structure
37. Tuning Configurations
✓ Configurations could be optimized for backward compatibility
✓ Not necessarily for performance
✓ Not necessarily for latest features
✓ Perf Test for your use cases and tune!
41. Tuning Configurations
✓ Durability vs throughput (bookkeeper.conf)
# Maximum latency to impose on a journal write to achieve grouping
journalMaxGroupWaitMSec=2
42. Tuning Configurations
✓ Disable auto recovery in bookkeeper when out for maintenance!
bookkeeper shell autorecovery -disable
STOP / MAINTENANCE / START
bookkeeper shell autorecovery -enable
43. Tuning Configurations
✓ Auto recovery vs throughput (broker.conf)
✓ If you have a small number of bookies, and a bookie goes down, auto recovery
may overwhelm the remaining bookies
✓ Number of entries that a replication will re-replicate in parallel
maxPendingReadRequestsPerThread=2500
rereplicationEntryBatchSize=100
44. Contribute to stay in sync
1. Development is fast, in fact very fast
a. Don’t maintain forks, easier to contribute
https://github.com/apache/pulsar/graphs/contributors
45. Contribute to stay in sync
1. Development is fast, in fact very fast
a. Don’t maintain forks, easier to contribute
2. We do the same!
https://github.com/apache/pulsar/graphs/contributors
47. Event Sourcing
1. Persisting your application's state by storing the history that
determines the current state of your application.
State of application at
any point in time
State of application at
this instant of time
https://docs.microsoft.com/en-us/previous-versions/msp-n-p/jj591559(v=pandp.10)
48. ● History of events
● Past Tense verbs
● Immutable
● Ordered
● Restore for state at any
point in time
● Use: CQRS, Audit trail etc
Event Sourcing
https://docs.microsoft.com/en-us/azure/architecture/patterns/event-sourcing
49. Representing Events (Schema)
1. Pulsar supports bytes, string, avro, ptobuff, json etc
2. Schemaless?
a. Any code that manipulates the data needs to make some assumptions about its
structure.
b. All producers and consumers know the hidden implicit schema.
3. Opinion: Use schema as far as possible.
a. Pulsar supports schema registry out of the box.
50. Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
51. Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Add custom
fields for UI etc
52. Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Add custom
fields for UI etc
Different attributes
depending on kind
of event
53. Representing Events (Schema)
1. Of course, Schemalessness offers a pragmatic alternative at times.
https://martinfowler.com/articles/schemaless/#non-uniform-types
Add custom
fields for UI etc
Different attributes
depending on kind
of event
Obviously, easy for
schemaless,
still needs care!
54. What to put on ONE topic?
1. Two choices:
a. Topic == collection of events of same type
b. Topic == events that need relative ordering guarantee.
https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
55. What to put on ONE topic?
1. Two choices:
a. Topic == collection of events of same type
b. Topic == events that need relative ordering guarantee.
2. Winner: choice (b)
https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
56. Avro / Proto (Struct) Schema
1. Language agnostic schema. Being stuck with one language sucks!
2. JSON seems first pick if you use REST, but
a. slow and
b. too verbose.
c. Complete Schema shipped with every message
3. Avro and proto are good.
4. We like Avro for its wide adoption.
a. And use pulsar’s built in schema registry
5. Consider keeping schema flat and fat (denormalize)!
https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
57. Schema Evolution
1. Choose a schema-auto-update strategy that suits use case.
a. We keep it forward compatible (add fields, delete optional fields)
b. Data produced with new schema can be read by consumers using last schema
c. Update producer, then consumers when they have time / need.
2. Each avro message contains an avro schema id & version.
3. Decode with the exact writer schema.
58. Summarizing Lessons
✓ Avoid bias to “known” when choosing a platform.
✓ Tune re-replication (ensemble, write quorum, ack quorum) when
scaling out bookies horizontally.
✓ Use schema, as far as possible!
✓ Tune configuration for size, resource, throughput, durability etc.
May be optimized for backward compatibility.
✓ Disable auto-recovery of bookie before taking down.
✓ Balance recovery with incoming user traffic.
✓ Put events that require ordering on same topic.
59. Stay Connected:
● Pulsar Mailing Lists
○ users@pulsar.apache.org
○ dev@pulsar.apache.org
● Pulsar Slack
○ https://apache-pulsar.slack.com
● You can contact me at:
○ https://twitter.com/ShivjiJha
○ https://www.linkedin.com/in/shivjijha/
Q & A Time