1. CONFIDENTIAL – Do Not DistributeRetail Core Technology
Storm in Retail Context
Catalog data processing using Kafka, Storm & Micro-services
Karthik Deivasigamani
@WalmartLabs
2. 2CONFIDENTIAL – Do Not DistributeRetail Core Technology
Retail Brick & Mortar
5. 5CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Catalog
Taxonomy
Classification => Product Type Category => Shelves
6. 6CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Catalog
Attributes
ProductTitle
Description
Brand
Color
Manufacturer
Model Number
Dimensions
7. 7CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Catalog
Product Matching
• UPC, GTIN, PLU, ISBN
• Algorithms
8. 8CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Catalog
Grouping
Variants
Bundles
9. 9CONFIDENTIAL – Do Not DistributeRetail Core Technology
Sources for catalog
• Market place Seller
• Content Providers
• Suppliers
• Merchants
• Legacy Catalogs
Product Catalog
10. 10CONFIDENTIAL – Do Not DistributeRetail Core Technology
Characteristics of ingestion pipeline
• Zero message loss
• Fault Tolerance
• Source based Priority Queue
• Scale to millions of product updates in an hour.
• Product updates in NRT
• Checkpoint at various stages
11. 11CONFIDENTIAL – Do Not DistributeRetail Core Technology
Processing source data
12. 12CONFIDENTIAL – Do Not DistributeRetail Core Technology
Processing source data
• Choice of language
• Teams operate independently
• Platform with pluggable services
Bolt
15. 15CONFIDENTIAL – Do Not DistributeRetail Core Technology
Micro batched Grouping Pipeline
Kafka
Spout
Router
Bolt
Product Group
Emitter Bolt
Validate
Persist
Publish
Micro-
Batching
Bolt
Kafka Payload Sample:
{
“variant_product_id” : “1234”,
“product_group_id” : “ABC”
}
Field
Grouping
16. 16CONFIDENTIAL – Do Not DistributeRetail Core Technology
Back Pressure
• Message loss
• Spout stops emitting
Knobs
• Spout parallelism
• kafka message fetch size
• max.spout.pending = max number of tuples that can be unacked at any given time
• Worker parallelism
• Bolt parallelism
17. 17CONFIDENTIAL – Do Not DistributeRetail Core Technology
Failures
• Data Errors
• Services Timeout
• Service outage
• Fatal Errors
• Validations at various stages
• Async IO using RxJava, Hystrix, Retries
• Hystrix Circuit Breaker
• Failing Tuples
18. 18CONFIDENTIAL – Do Not DistributeRetail Core Technology
Characteristics of ingestion pipeline
• Zero message loss
– Anchoring and Failing Tuple, maxOffsetBehind = Long.MAX_VALUE
• Product updates in NRT
• Priority Queue
– Partition based and topic based
• Scale to millions of product updates in an hour.
• Fault Tolerance
– Worker failures, Node failures are handled by storm
– Nimbus and Supervisors are stateless, fail-fast
• Checkpoint at various stages
19. 19CONFIDENTIAL – Do Not DistributeRetail Core Technology
What we monitor
• Kafka Lag
• Bolt Capacity
• JVM – heap, threads
• Service SLA
• Acked and Failed Tuples
• Data Errors and System Errors
• OS Metrics
20. 20CONFIDENTIAL – Do Not DistributeRetail Core Technology
Tools For Monitoring
• Kafkamon – Monitor lag in the pipeline
• Guano – Dump and restore ZK state
• Storm UI
• Elastic & Kibana – Async logging using log4j2, scribe
• Grafana to monitor service latency
• Druid for tracking and analytics
• FIT – Fault Injection Tool
23. 23CONFIDENTIAL – Do Not DistributeRetail Core Technology
Holiday Season
• Few thousands sellers
• 100M+ seller SKU
• 6x traffic
• Upgraded to 1.0.2 – HA Nimbus, Improved performance, Improved backpressure handling
• Change detection
• Improved our monitoring, periodic fault injection
• Fast track / Priority Queue for top items
How we prepared
24. 24CONFIDENTIAL – Do Not DistributeRetail Core Technology
Lessons learnt
• Things will fail
• Monitor everything
• Automation
• Scale is not a feature
• Storm works well with large payloads
• Logs don’t lie
• Micro services come at a cost
25. 25CONFIDENTIAL – Do Not DistributeRetail Core Technology
Path ahead
• Stateful stream processing
• Storm 1.1.0
– Streaming SQL
– Druid integration
– PMML(Predictive Model Markup
Language) Support
26. 26CONFIDENTIAL – Do Not DistributeRetail Core Technology
Team
Yes, we are hiring!
http://www.walmartlabs.com/jobs/