2. Big System’s Targets
• High Performance
• High Scalability
• High Reliable
• Low Cost Maintenance
3. Problems
• IO Bottleneck
• Scale Processing
• Handle a Huge Concurrent Request
• Availability and Partition Fault Tolerance
• Deal with consistency and concurrency
4. Split to Scale
• If you can’t split, you can’t scale it
• From Monolithic System to Distributed System
• From One to Many Processing
• From One to Many Persistent
• From Single to Parallel Processing
• From Synchronous to Asynchronous
5. Data Replication
• Every nodes need a way to communicate with
other
• Data replication is the most important in
distributed system
• The reliability of a system depends on the way
data replication
10. 1. Guarantee No Lost Data
We usually do both:
- Write Data To DB
- Send Message To
Queue
Database
Message
Queue
Processing
Problems, In fact:
- Can Write But Can’t
Send
- Can Send But Cant’
Write
11. 1. Guarantee No Lost Data
• Solutions:
• Use One Way data flow:
Process —> Database —> Message Queue
• Use Transaction Log to Dispatch Data Changes
13. 2. Guarantee
Sending Ordering
• Problems:
• Each request sending out
one message at the
same time, in different
threads
• One of the messages can
be fail in sending
• That cause the messages
are not in ordering
14. 2. Guarantee
Sending Ordering
• Use Transaction Log To Append Un-dispatched
Message in Order
• Asynchronous Sending Un-Dispatched Message
to Message Queue
16. 3. Guarantee
Delivery Ordering
E2 E3 E4E1
Worker 1 Worker 2 Worker 3 Worker 4
- Events are dequeued in
concurrency by many
workers
- Message Queue can
guarantee first in first out
- The later event can be
processed faster than the
earlier event —> cause lost
ordering
17. 3. Guarantee
Delivery Ordering
• Solutions:
• if use Rabbit MQ/Active MQ: use only one
consumer for a queue
• If use Kafka, Kafka guarantee ordering delivery
message per each partition. Only one thread of a
consumer group can receive message from a
partition
18. 4. Idempotemcy Filtering
• This is about duplicate message
• A message can be delivery more than one time
• Example: can deposit twice because receive
deposit message twice
19. 4. Idempotemcy Filtering
• Solutions:
• Use UUID/GUID v4 for message id
• Use timestamp or version of message to detect
duplicate
20. 5. Versoning Message
• Replicated data is
always eventually
consistency
• Sometime we
need to know
about how stale
data is
V4 V3 V2V5
Write V5 Read V1
21. 5. Versoning Message
• Use timestamp
• Use incremental version (integer)
• Guarantee increase version consistency when
write data
22. 6. Non Blocking IO
• How to handle million
messages in a queue?
• Solutions:
• Processing message in
pipeline.
• Split processing in three
separated phases: receiving,
handling and completing
message
• Each phase is processing in
parallel
receiving
handling
completing
23. 7. Capture Data Changes
• Is the way capture data changes of DB to
replicate data to Message Queue
• Use specific mechanism of DB to know the
changes of Data
24. MySQL Bin Log
• Decode My
SQL Bin Log
to know new
data changes
MySQL My SQL Binlog
Event Handler
Decode Bin Log
Message Queue
25. Postgresql Notification
• Use Postgres
Notification to
notify the
changes of
data
Postgresql
Notification
Receiver
Message Queue
Notify
26. Thank You
• Contact: Lê Minh Nghĩa
• Email: nghia.fit@gmail.com
• Facebook: /nghialeminh