Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay
1. PAGE 1 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
#GHC16
2016Building Resilient Distributed
Systems by Using Caching
Command and Rollback-Replay
Tanuja Phadke
tanuja_phadke@intuit.com
2. PAGE 2 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
The problem with resiliency in distributed systems
Single node system
Node
Database1
Web
container
caching
Database2
• All components reside in the same
machine.
• It’s not too hard to ensure atomicity.
• Either all occur or nothing occurs
3. PAGE 3 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
The problem with resiliency in distributed systems
node node1
node2
node2node1
node1 node2
• Components are spread out.
• Maintaining atomicity and resiliency is a
challenge.
• So we strive for eventual consistency.
• The change will eventually be propagated
to all the copies of data.
4. PAGE 4 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Intuit case study: Login Service
Intuit makes financial software. Many of these products
use the Login service for login and fetching users’ bank
accounts and transactions securely.
5. PAGE 5 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Requirements for the Login Service
• Fast response times
• Resilience
• Fault-tolerance
• Consistency
6. PAGE 6 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
4-step solution we used to solve the problem
1. Decouple design
a. Implement single responsibility principle (SRP)
b. Use the command pattern
2. Use circuit breaker framework
3. Use reactor to recover
4. Use caching (record)
7. PAGE 7 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
1. Decouple design
• Individual components can be developed independently.
• Plug and play components into bigger solution.
8. PAGE 8 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
1a. Implement single responsibility principle
• Module or class should have responsibility over a single part
of the functionality provided by the software, and that
responsibility should be entirely encapsulated by the class. All
its services should be narrowly aligned with that
responsibility.
• Separation of concerns
• Each module/method does only one task.
9. PAGE 9 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
1b. Use command pattern
Invoker
Client
creates
<<interface>>
Command
execute()
recover()
Concrete
Command A
Concrete
Command B
implements
uses
creates
A behavioral design pattern in which an object is used to encapsulate all
information needed to perform an action or trigger an event at a later time.
10. PAGE 10 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Benefits of the command pattern
• Each command knows how to execute itself.
• Each command knows how to react to failures.
• Rollback
• Retry
• Something else
11. PAGE 11 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Traditional model with services
Orchestration
Handler
Service A
Service B
create
update
delete
get
create
update
delete
get
GET
PUT
12. PAGE 12 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Introduce commands
Orchestration
Handler
Service A
Service B
create
update
delete
get
create
update
delete
get
GET
PUT
Command
create, update ...
Command
create, update ...
13. PAGE 13 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
2. Use circuit breaker
• Circuit breaker is used to detect failures, and encapsulates
logic to reacting to failure (during maintenance, temporary
external system failure or unexpected system difficulties).
• The circuit breaker pattern is a stability patterns applied in a
RESTful architecture.
• Several open sources are available (Hystrix is developed by
Netflix and is popular open source).
14. PAGE 14 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Example of circuit breaker
15. PAGE 15 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
3. Use reactor
• Gets invoked in case of failure.
• We can specify the behavior.
• Rollback
• Retry
• Trigger a back-up
• Fallback
16. PAGE 16 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Use circuit breaker
Orchestration
Handler
Service A
Service B
create
update
delete
get
create
update
delete
get
GET
PUT
Circuit breaker
Fallback
Short circuit
Log error
Error response
CommandCommand
17. PAGE 17 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
4. Use caching
• Use cache to save the commands so that they can be used for
recovery.
• Some popular open source solutions:
• Hazelcast
• Memcache
• Redis
18. PAGE 18 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
4. Use caching
Orchestration
Handler
Service A
Service B
create
update
delete
get
create
update
delete
get
GET
PUT
Fallback
Short circuit
Log error
Error response
Cache
Client
Cache
Cache
Listener
[Reactor]
19. PAGE 19 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Full resilient picture
Orchestration
Handler
Service A
Service B
create
update
delete
get
create
update
delete
get
GET
PUT
20. PAGE 20 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Service A fails
Orchestration
Handler
Service A
Service B
create
update
delete
get
create
update
delete
get
GET
PUT
Fallback
Short circuit
Log error
Error response
21. PAGE 21 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Service A is successful and Service B fails
Orchestration
Handler
Service A
Service B
create
update
delete
get
create
update
delete
get
GET
PUT
Fallback
Short circuit
Log error
Error response
Cache
Client
Cache
Cache
Listener
[Reactor]
Reactor(recover)
Caching dirty
22. PAGE 22 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Service A and Service B both succeed
Orchestration
Handler
Service A
Service B
create
update
delete
get
create
update
delete
get
GET
PUT
Fallback
Short circuit
Log error
Error response
Cache
Client
Cache
Cache
Listener
[Reactor]
Caching(record)
Not Dirty
23. PAGE 23 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
4-step solution we used to solve the problem
1. Decouple design
a. Implement single responsibility principle (SRP)
b. Use the command pattern
2. Use caching (record)
3. Use circuit breaker framework
4. Use reactor to recover
24. PAGE 24 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Our story: How did we benefit?
Over 100 user update requests were failing.
• They got slow responses.
• Resulted in high CPU utilization and cascading failures.
After we implemented this solution, we failed fast and
could adhere to the SLAs.
25. PAGE 25 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
For more info ...
Retry pattern
https://msdn.microsoft.com/en-us/library/dn589788.aspx
Command Handling
http://www.axonframework.org/docs/2.0/command-
handling.html
26. PAGE 26 | GRACE HOPPER CELEBRATION 2016 | #GHC16
PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY
Thank you
Feedback?
Download at http://bit.ly/ghc16app
or search GHC 16 in the app store
Rate and review the session
on our mobile app
Editor's Notes
Based on fast reads, writes requirements we can choose chaches. For mem cache read is fastest, max size of value.
http://blog.engineering.aol.com/2015/08/28/a-comparative-study-of-distributed-caches/