More Related Content Similar to Architecture of Falcon, a new chat messaging backend system build on Scala (20) Architecture of Falcon, a new chat messaging backend system build on Scala2. Architecture of Falcon, a new backend chat messaging system build on Scala
2017/02/27 © ChatWork All rights reserved. 2
Goal of Architecture
• Scalability:
• linear increase of throughput by adding nodes
• keep stable and low latency
• High Performance:
• achieve 100 times higher throughput than the current load without further
architectural changes
• Resiliency:
• avoid chain reaction of failures
• fast recovery from partial failure
• Low cost:
• keep cluster size as small as possible
• resist temporal load without additional resources
• high performance/resource ratio
• Legacy system integration
• keep consistency without transactions
3. Architecture of Falcon, a new backend system build on Scala
2017/02/27 © ChatWork All rights reserved. 3
Architecture Overview
4. Architecture of Falcon, a new backend system build on Scala
2017/02/27 © ChatWork All rights reserved. 4
Architecture Overview
• “Write API” exposes asynchronous API. Persists event and immediately returns
`202 Accepted`. No queries and mutations. Storage is Kafka.
• “Read API” can only query read model. No mutation. Both query by key and
query by key range are supported. Storage is HBase.
• ReadModelUpdater is a Kafka consumer creates read model queried by Read
API from events generated by Write API.
• PostProcessorForwarder is a Kafka consumer notifies legacy PHP system to
execute remaining transactions, e.g. push notification.
5. Architecture of Falcon, a new backend system build on Scala
2017/02/27 © ChatWork All rights reserved. 5
CQRS: Command Query Responsibility Segregation
• Command and Query responsibility is segregated at system level.
• Specialized responsibility make a system simple
• Each system uses different models
• “Write API” uses immutable events to represent history of user actions.
• “Read API” uses read models optimized for query.
• Dedicated storages are used for command and query system each.
6. 2017/02/27 © ChatWork All rights reserved.
Convergent Evolution of technology
“Convergent evolution is the independent evolution of similar features in species of
different lineages. ”
6
https://en.wikipedia.org/wiki/Convergent_evolution
DDD
Fighting against complexity
of domain model with Event
Sourcing
Big Data
Fighting against complexity
of big data with Log
Processing
https://www.infoq.com/news/2016/05/event-sourcing-stream-processing
Two communities invented similar features independently.
Falcon is influenced by knowledge of two communities.
Architecture of Falcon, a new backend system build on Scala
7. 2017/02/27 © ChatWork All rights reserved. 7
Inter-system Synchronization
• Falcon subsystems and PHP system are so called
“microservices”.
• Microservices do not share persistent storage.
• Event Sourcing to synchronize systems with properties:
• No events are lost (within retention period).
• The order of message events are preserved within a chat
room.
• Events are processed in at-least-once manner.
• Processing the same event twice has no effects
(idempotent).
Architecture of Falcon, a new backend system build on Scala
8. Architecture of Falcon, a new backend system build on Scala
2017/02/27 © ChatWork All rights reserved. 8
Kafka features helpful for Event Sourcing
• auto-sharding
• Events are partitioned to be processed in parallel.
• strong consistency:
• One partition can be processed by single consumer.
• Consumer can have internal states.
• Resilient:
• Partition assigned to crashed consumer is rebalanced to
another consumer automatically.
• Easy to connect services
• Forward events to next topic
topic 1
topic 3
topic 2
topic 4
9. 2017/02/27 © ChatWork All rights reserved. 9
• subsystem may show temporarily poor performance:
• load spikes
• Compaction of HBase
• Legacy PHP system failure caused by process saturation
• AWS component failure
•Using Kafka as command-side storage help defend subsystem:
•Kafka can easily handle events produced with higher throughput as 40
times as normal load without scaling out.
•Kafka consumer can consume events with stable throughput. This
ensures subsystem to deal with predictable throughput.
•Throttling of Kafka ensures upper limit of throughput.
Architecture of Falcon, a new backend system build on Scala
10. 2017/02/27 © ChatWork All rights reserved. 10
1. SQL query latency increased at Amazon
Aurora of PHP system
2. PostProcessorForwarder caused
timeout to call PHP system
4. Throughput of processing events decreased.
Once subsystem recovered from failure, the
throughput increased to consume stacked events
but never exceeded upper limit due to throttling.
3. Events stacked on queue in Kafka
Architecture of Falcon, a new backend system build on Scala
11. 2017/02/27 © ChatWork All rights reserved.
ACID semantics of Falcon
• Atomicity: No atomicity among posting message and associated
operations, e.g. unread count calculation. Intermediate state can be
observed.
• Consistency: Eventual consistency. Read “Consistency Model”.
• Isolation: No concurrent mutation of the same record. Events are
processed sequentially. No need to isolate.
• Durability: Yes. No messages are lost.
• Visibility: No guarantee. There is short term posted message cannot be
observed. We try making the term as short a.p.
11
ACID does not provide high availability and scalability.
Falcon does not have ACID properties.
http://people.eecs.berkeley.edu/~brewer/cs262b/TACC.pdf
Architecture of Falcon, a new backend system build on Scala
12. 2017/02/27 © ChatWork All rights reserved.
Consistency Model
14
Choose C or A based on CAP theorem.
CA CA CA
CA
Architecture of Falcon, a new backend system build on Scala
13. 2017/02/27 © ChatWork All rights reserved.
Consistency Model
15
CA CA CA
CA
•Availability for user-facing subsystems, Write API and Read API
• Ensure always writable and readable. Loosing availability means service down.
•Consistency for background subsystems, ReadModelUpdater and PostProcessorForwarder.
• Ensure internal state consistency. Loosing availability is not obvious for users.
Architecture of Falcon, a new backend system build on Scala
14. 2017/02/27 © ChatWork All rights reserved.
Recovery from human errors
• Falcon can recover from data corruption without service stop.
• The system might damage data was ReadModelUpdater if
malfunctioning.
• “Write API”, “Read API”, “PostProcessorForwarder” cannot
mutate data.
• Since input events are preserved in Kafka, output can be
recalculated by resetting offsets of Kafka consumer.
• Stopping ReadModelUpdater does not affect availability of
service.
18
Architecture of Falcon, a new backend system build on Scala