In Onebip we developed a reporting system based on CQRS (Command Query Responsibility Segregation) and Event Sourcing using MongoDB.
In this talk I will introduce CQRS and Event Sourcing concepts, I will talk about our path and technical and conceptual challenges we faced, the strenght of our solution and the parts where there's room for improvement.
5. About Onebip
Mobile payment platform.
Start-up born in 2005,
acquired by Neomobile
group in 2011.
Onebip today:
- 70 countries
- 200+ carriers
- 5 billions potential users
42. Commands
Anything that happens in one of your domains
is triggered by a command and generates one
or more events.
Order received -> payment sent -> Items queued
-> Confirmation email sent
43. Query
Generate read models from events depending
how data need to be actually used (by users
and other application internals)
44. Event Sourcing
The fundamental idea of Event Sourcing is that of ensuring
every change to the state of an application is captured in an
event object, and that these event objects are themselves
stored in the sequence they were applied.
― Martin Fowler
45. Starting from the beginning of time, you are
literally unrolling history to reach state in a
given time
Unrolling a stream of events
46. Idea #1
Every change to the state of your application is
captured in event object.
“UserLoggedIn”, “PaymentSent”, “UserLanded”
47. Idea #2
Events are stored in
the sequence they
were applied inside
an event store
49. Idea #4
One way to store data/events but potentially
infinite ways to read them.
A practical example
Tech ops, business control, monitoring,
accounting they all are interested in reading
data from different views.
51. You start with this
{
"_id": ObjectId("123"),
"username": "Flash",
"city": …,
"phone": …,
"email": …,
}
52. The more successful your company
is, the more people
…
The more people, the more views
53. With documental dbs it's magically easy to add new
fields to your collections.
54. Soon you might end up with
{
"_id": ObjectId("123"),
"username": "Flash",
"city": …,
"phone": …,
"email": …,
"created_at": …,
"updated_at": …,
"ever_tried_to_purchase_something": …,
"canceled_at": …,
"acquisition_channel": …,
"terminated_at": …,
"latest_purchase_date": …,
…
}
55. A bomb waiting to detonate
It’s impossible to keep adding state changes to your
documents and then expect to be able to extract them with
a single query.
57. Event Store
● Engineered for event sourcing
● Supports projections
● By the father of CQRS (Greg Young)
● Great performances
http://geteventstore.com/
The bad
Based on Mono, still too unstable.
58. LevelWHEN
An event store built with Node.js and LevelDB
● Faster than light
● Completely custom, no tools to handle
aggregates
https://github.com/gabrielelana/levelWHEN
59. The known path
● PHP (any other language
would just do fine)
● MongoDB 2.2.x
63. Service |
|
[event payload] |
|
Service --- Queue System <------------> API -> MongoDB
/ |
/ [event payload] |
/ |
Service |
The write architecture
65. MongoDB replica set
A MongoDB replica set with two logical dbs:
1. Event store where we would store events
2. Reporting DB where we would store
aggregates and final reports
68. Don’t trust the network: Idempotence
{
'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',
…
}
The _id field is actually defined client side and
ensures idempotence if an event is received
two times
69. Indexes
● Events collection is huge (~100*N documents)
● Use indexes wisely as they are necessary yet
expensive
● With suggested event structure:
{‘data.meta.created_at’: 1, type:1}
70. Benchmarking
How many events/second can you store?
Our machines were able to store roughly 150 events/sec.
This number can be greatly increased with dedicated IOPS,
more aggressive inserting policies, etc...
71. Final tips
● Use SSD on your storage machines
● Pay attention to write concerns (w=majority)
● Test your replica set fault tolerance
78. Sequential projector 2/2
● It’s a good idea to select fixed sizes batches to avoid
memory problems when you load your Cursor in memory
● Could be a long-running process selecting events as they
arrive in realtime
79. Event mapper 1/3
Translates event fields to the Read Model domain
Takes an event as input, applies a bunch of logic and will
return a list of Read Model fields.
83. The Projection after event #1
db.users_conversion_rate_projection.findOne()
{
'user_id': 123,
'user_name': 'flash',
'email': 'a-dummy-email@gmail.com',
'registered_at': "2014-21-11T00:00:01Z"
}
84. The Projection after event #2
{
'user_id': 123,
'user_name': 'flash',
'email': 'a-dummy-email@gmail.com',
'registered_at': "2014-21-11T00:00:01Z"
'purchased_at': "2014-21-11" // Added this field and rewrote others
}
85. The Projection collection{
'user_id': 123,
'user_name': 'flash',
'email': 'a-dummy-email@gmail.com',
'registered_at': "2014-21-11",
'purchased_at': "2014-21-11" // Added this field and rewrote others
}
{
'user_id': 456,
'user_name': 'batman',
'email': 'a-dummy-email@gmail.com',
'registered_at': "2014-21-11",
'purchased_at': "2014-21-11" // Added this field and rewrote others
}
{
'user_id': 789,
'user_name': 'superman',
'email': 'a-dummy-email@gmail.com',
'registered_at': "2014-21-12",
'purchased_at': "2014-21-12" // Added this field and rewrote others
}
86. The Projection - A few thoughts
Note that we didn't copy from events to projection
all the available fields. Just relevant ones.
87. From these two events we could have
generated infinite read models such as
● List all purchased products and related amounts for the
company buyers
● Map all sales and revenues for our accounting dept
● List transactions for the financial department
90. The aggregation (2) - User with a purchase
var purchased = db.users_conversion_rate_projection.aggregate([
{
$match: {
"registered_at": { $gte: ISODate("2015-11-21"), $lte: ISODate("2015-11-22") },
"purchased_at": { $exists: true }
}
},
{
$group: {
_id: { },
count: { $sum:1 }
}
}
]);
91. The aggregation (3) - Automate all the things
● You can easily create the aggregation framework statement
by composition abstracting the concept of Column.
● This way you can dynamically aggregate your projections
on (for example) an API requests.
● If your Projector is a long running process, your projections
will be updated to the second and you automagically get
realtime data.