18. The NewYork Times on
This architecture - Fabrik - has dozens of RabbitMQ instan
spread across 6 AWS zones in Oregon and Dublin.
Upon launch today, the system autoscaled to ~500,000 use
Connection times remained flat at ~200ms.
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2014-
January/032943.html
29. Culprits
• queue_index flush journal
• messages published into the queue index when
paging
• memory leaks related to refc binaries
30. Flushing the QI Journal
• Sparse Array by Message SeqIds
• Publishes, Delivers and Acks
• Queue Index is written in Append Only Segments
• RabbitMQ keeps an in memory Journal of ops to
prevent going to disk all the time
31. Flushing the QI Journal
queue_index_max_journal_entries = 65536
32. Flushing the QI Journal
entry_to_segment/3 called 65536 times
33. Flushing the QI Journal
entry_to_segment/3 called 65536 times
Appending data to a binary
and then writing that to disk
34. Solution
• Keep a Sparse Array as a cache of “entry to
segments”
• Flush said array to disk whenever the Journal
needs to be flushed
40. Messages Publishes into the
Queue Index
• Implemented Batch Publishing into the Queue
Index
• Batch Handling of Delivers and Acks
• Flush Publishes, Delivers and Acks cache every
20.000 messages or when paging finished.
42. binary memory leak
• Related to Paging
• discovered with recon
• forced queue process garbage collection after
paging
http://ferd.github.io/recon/index.html
51. Solution III
• Lazy-Queue concept
• Disable in-memory message cache
• Only load messages in RAM if consumers are
present
• Implement Lazy Queues for all Backing Queue
implementations
55. Other Improvements
• moved to erlang.mk
• moving to Ranch/Cowboy and related libraries
• improved management UI performance
• improved rabbitmqctl performance
57. Future
• add Raft based queue mirroring
• add Raft based cluster formation
• Re-thing rabbitmqctl to allow for extensions
• Tools to recover data from disk