This document describes Facebook-style notifications using HBase and event streams. It discusses capturing user intent from actions like browsing, buying, and registering on ecommerce sites. Notifications are pre-created in real-time by intersecting user intent data with product change events in HBase. The solution uses technologies like Trooper for event processing, Phantom as a reverse proxy, and Flipcast for multi-cast notifications. It provides low-latency access to notifications while scaling to handle large volumes of user, product, and event data.
2. Serving User Intent (eCommerce)
• Mass targeted
(Low relevance)
– User Intent Captured
from: Browse, Buy,
Register
• Quantified,Time-bound
(Improved relevance)
– User Intent Derived from:
Category Affinity,
Recommendations
3. Serving User Intent (social)
Image Source : http://allfacebook.com/
• Near real-time
– Quick updates about
friends’ actions that most
affect you
• Relevant Actions
– Likes, Comments etc
• Personalized
– Content only from social
circle
• Non-invasive
– Users therefore tolerate less
relevant content as
compared to email
7. Gather
User
Intent
Retrieve
Current,
Past
Data
Intents
Data store
• Pros
• Perceived optimal resource utilization
• Cons
• Gathering, Processing and Serving coupled
• Read path is computationally expensive
• High latency
• Need versioning support on Product data
• Repeated computations Product
Data store
Create Notifications on Visits
8. Solution 2 : Pre-create in Real-time,
Serve on Demand
9. What Leads to a Notification?
Intent (interest expressed by the user) ⋂ Event (price changes ) => Notification
(Intersection of millions of User Intent and Product Changes)
Intent Event
Stream
Change Event
Stream
Notifications
12. The Data Store
• Store large sets of data
– Products(P) 10s M
– Users(U) 10s M
– Activity(I = U X P) 100s M
– Events/day (E = P + U) 10s M
– Notifications (N = E ⋂ I) >100 M (in total)
• High write throughput
• High read throughput for sets of data
– Intents: user pivoted, Facts: product pivoted
• Low latency reads
– Notifications – user pivoted, ordered by recency
13. The Data Store - HBase
U:USERID_A:TIMESTAMP:PRICE_DROP:MOBDSGU2ZMDYENQ
U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDSGU2ZMDYENQ
U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDQ9VXXXX6NF8V
U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDP6W6MCUWCF
U:USERID_C:TIMESTAMP:PRICE_DROP:MOBDQ9VXXXX6NF8V
LSM Tree
Row key design for Notifications table
Image Sources : http://blog.sematext.com/,
http://dailyjs.com/
• Benefits of keeping related
data together
– Minimize disk seek for rows
read
– Rows may be returned from
Block cache, MemStore
14. Intent Capturing
System
Event Processing
System
Notification Serving
System
HBase
(Intents,
Notifications)
Product
changes
append
create
update
expire
Event based Pre-processing Near real-time Serving
read
Tech Stack
Trooper
Batch
W3 via
Phanto
m
Trooper SEDA
(RabbitMQ, Mule),
CEP (Esper)
Phantom Flipcast
CeryxTomcat
CDNMemcached
15. Tech Stack
• Phantom – Reverse proxy for latency sensitive user actions
• Trooper Batch – Cron jobs
• Trooper SEDA – Distributed, Event processing
• FlipCast – Platform agnostic multi-cast notifications
• RabbitMQ – Integration, Work distribution
• Esper – Complex Event Processing (Filtering/Matching)
• HBase – Data store
• Tomcat – REST services container for Notifications
• Ceryx – Target Group generation, User preferences
Flipkart OSS Public domain OSS Closed source
18. Recap
• Pros
– Low latency read-path, resilience to failure (ok to not show
notifications for some users)
– Scales well (LSM trees, KV store, SEDA, CDN for images)
– Immutable Facts, Change Events stored in append-only data store
provides ability to re-compute notifications
• Cons
– Consistency challenges
•HBase has strong consistency (single write master) but Notification
source data can change – leading to Eventual Consistency
– Pre-creating Notifications that may never be seen (cost of storage)