2. I’ll talk about
● Scalability truths
● Event driven systems/architecture (EDA)
● Complexity of maintaining State in EDA
● Stateful EDA = Workflow
● Serverless architecture - why it is useful, Primitives
● Design for a Serverless Stateful EDA platform
● The Flux project
4. Things you hear
● Our platform is on the Cloud and we can scale seamlessly, often implying:
○ No impact on Performance - user perceived latencies
○ No impact on Data - all transactions are committed, data is durable and consistent
● We follow SOA and our services are Distributed across XX numbers of
servers, often implying:
○ Our services are Stateless
○ Workloads are mostly identical and therefore can be serviced by any server
○ Our systems can keep up with the growth in available memory & compute
○ Network bandwidth is not a constraint & data transfer within the DC is a non-issue
○ There is no need for Consensus (or) there are no faulty processes
5. “There is no such thing as a "stateless" architecture; it's just someone else's problem”
Truths
ACID 1.0
Atomicity
Consistency
Isolation
Durability
ACID 2.0
Associative
Commutative
Idempotent
Distributed
with Scale
● Full Transaction support
● No Data Staleness
● Strict Ordering
● Zero Data Loss
● Limited Transaction support
● Eventually Consistent
● Relaxed Ordering
● High Availability
Jonas Boner (CTO Lightbend)
“A distributed system is one in which the failure of a computer you didn't even know existed
can render your own computer unusable.” Lamport
State Management
6. State in e-Commerce systems
● Stateless (not logged in) enables caching of content
○ But, items go out of stock
○ Problems around data consistency
● Challenges in being Stateful
○ Availability takes a hit; Share-nothing is not feasible
○ Problem of Consensus when data gets replicated
○ Need for Data durability and guaranteed execution
■ Move from Fail-fast to Succeed-at-any-cost
8. ● Events represent Facts
● Events are Immutable
● Events can be triggers for Actions
● Receive and React to Events/Facts
● Publish new Facts to the rest of the world
● Invert the Control Flow - minimizing coupling
and increasing autonomy
Events Event driven services
9. Event driven Finite State Machine (FSM)
Current State Input Next State Output
Locked coin Unlocked Unlock turnstile so customer can push through
Locked push Locked None
Unlocked coin Unlocked None
Unlocked push Locked When customer has pushed through, lock turnstile
(Source: Wikipedia)
an abstract machine that can be in exactly one of a finite
number of states at any given time
10. Stateful EDA = Workflow
Events may be:
● User actions
○ E.g. Item packed
● Messages from Messaging middleware
○ E.g. Offer go-live
● Database state change of entities
○ E.g. Order created
“Set of Tasks that execute Sequentially or
Concurrently; and are triggered by Events”
11. Flux
● An Asynchronous, Distributed and Reliable state machine based orchestrator
● Used to build Stateful event-driven apps and workflows.
● Simple Primitives and Deployment dependencies
● Available as a Hosted Service at Flipkart & as a stand-alone library
○ Users : Accounting, Seller Platform, Compliance & Document generation, FinTech etc.
● Stateful Serverless(mostly) platform
● Open Source : https://github.com/flipkart-incubator/flux
12. Flux use case - Doctor tele-consultation
Deployed on AWS
14. Feature Set
State Management
Async and Distributed processing
Fork & Join
Parallel processing
Correlation Id support
Retriable Errors
Configurable Retries & Timeouts
Versioning of workflows
At least once guarantees
Idempotency at workflow level
External and Scheduled events
Cancellation of ongoing Workflow
Metrics & Audit
Dynamic conditional branching
23. ● Guarantees atleast-once execution of Tasks
○ Using durable states, transitions, events
● Local retries for timed-out executions and on Retriable exception
○ For @Task instances with ‘retries’ and ‘timeout’ attribute values - Retries executed on the same Flux
node
● Global retries for stalled state transitions using Redriver
○ For @Task instances with ‘retries’ and ‘timeout’ attribute values - Retries executed on any of the Flux
cluster nodes
Fault tolerance
31. Additional Reading
● Microsoft Service Fabric - Distributed platform for building Stateful micro
services
● The Tail at Scale - by Jeff Dean et.al - Building Latency tail-tolerant systems
36. Few Rules
● All params and return values should implement Event
interface
● Stateless Classes
● Immutable event objects in workflow method
● Workflow method must return void
37. When a seller initiated pack, we need to call and update various systems. Downstream systems involved are
1. Vendor Assignor - this generates a tracking id which has to be shown on the label printed on the shipment.
2. Accounting - to generate invoices and various other accounting entries
3. Doc Service - this creates a PDF consisting of the label and the invoice which has to be printed and attached in the shipment.
4. Logistics Compliance Engine - to generate various government forms required when a shipment has to cross state borders.
Flux orchestration is used in above scenarios.
Use Cases - Seller Fulfilment Service
38. Testability
● Unit and Integration Testing
○ As workflows are written in Java, Unit and Integration testing would be similar to writing them
for any other Java code
● Debugging through IDE
○ You can attach a debugger to Flux process to debug the workflow while development. Find
more details about it on https://github.com/flipkart-incubator/flux/wiki/Examples.
● Toggling between Async and Sync
○ Removing “FluxClientInterceptorModule” from your Guice Injector lets your code to run in
Sync mode. With this you can toggle between Async and Sync whenever you want.