2. Who I am
▪ Norman Huang (normany@yahoo-inc.com)
▪ Software & Data Engineer of Yahoo! Taiwan
▪ Aims to retrieve and deliver data insights via BI
platform and data mining algorithms.
2
3. Who I am
▪ Jason Lin (jasonysl@yahoo-inc.com)
▪ Software & Data Engineer of Yahoo! Taiwan
▪ Responsible for recommendation system
personalization mechanisms and cloud
computing developer.
3
5. Challenges
!
!
!
!
!
!
▪ Static content until next batch job.
!
!
!
5
Processing
6. Challenges
!
!
!
!
!
!
▪ Static content until next batch job.
▪ Batched product recommendation algorithms have become common
features among e-commerce platforms.
!
6
Processing
7. Challenges
!
!
!
!
!
!
▪ Nearly 72% of visitors made their decision at the same day.
7
Absorbed into batch views Not yet absorbed
Time
Several hours of data
8. Challenges
!
!
!
!
!
!
▪ Nearly 72% of visitors made their decision at the same day.
▪ Real-time solution to interact with potential buyers.
8
Absorbed into batch views Not yet absorbed
Time
Several hours of data
11. Pinball
!
▪ Real-time classifier
▪ Detect buyers’ preferences by streaming data processing
▪ Deliver personalized ads and product recommendations on the fly
11
12. Pinball
!
▪ Real-time classifier
▪ Detect buyers’ preferences by streaming data processing
▪ Deliver personalized ads and product recommendations on the fly
!
▪ Challenges
› How do to it in real-time?
12
13. Pinball
!
▪ Real-time classifier
▪ Detect buyers’ preferences by streaming data processing
▪ Deliver personalized ads and product recommendations on the fly
!
▪ Challenges
› How do to it in real-time?
› Storm
13
14. Pinball
!
▪ Real-time classifier
▪ Detect buyers’ preferences by streaming data processing
▪ Deliver personalized ads and product recommendations on the fly
!
▪ Challenges
› How do to it in real-time?
› Storm!
› How to determine customers’ purchasing desire?
14
15. Pinball
!
▪ Real-time classifier
▪ Detect buyers’ preferences by streaming data processing
▪ Deliver personalized ads and product recommendations on the fly
!
▪ Challenges
› How do to it in real-time?
› Storm!
› How to determine customers’ purchasing desire?
› Buying Intention Detection
15
31. Buying Intention
▪ Based on our findings:
› The more page views, the higher the chance a visitor will buy it.
› BUT, the buying intension value of each category will vary.
31
2 6
35. Learning & Classifier
▪ Online Binary Classification
› Simple and computationally efficient
▪ e.g.
› assumptions: γ=0.1, BI = 3
› scenario: a user makes 6 page views before purchasing
• BI’ = 3 + (6-3) x 0.1
• BI’ = 3.3
35
BI ' = BI +(PV − BI )×γ
38. Lambda Architecture
▪ Term created by Nathan Marz (Creator of Apache Storm)
!
▪ Batch Real-time processing
Yahoo Confidential & Proprietary
38
39. Lambda Architecture
▪ Term created by Nathan Marz (Creator of Apache Storm)
!
▪ Batch Real-time processing
Yahoo Confidential & Proprietary
39
40. Lambda Architecture
▪ Term created by Nathan Marz (Creator of Apache Storm)
!
▪ Batch + Real-time processing
› Hybrid batch and real-time processing
Yahoo Confidential & Proprietary
40
41. Lambda Architecture
▪ Term created by Nathan Marz (Creator of Apache Storm)
!
▪ Batch + Real-time processing
› Hybrid batch and real-time processing
› Batch processing is treated as source of truth, and real-time updates
models/insights between batches.
Yahoo Confidential & Proprietary
41
52. Find out the heavy users!
▪ Memorize the numbers of page views for each user
▪ If the numbers are great than 3, it’s a heavy user
Yahoo Confidential & Proprietary
52
53. Find out the heavy users!
Yahoo Confidential & Proprietary
53
User Log
Spout
Learning
Bolt
userid, type, catlv1, catlv2, timestamp
54. Find out the heavy users!
Yahoo Confidential & Proprietary
54
User Log
Spout
Learning
Bolt
userid, type, catlv1, catlv2, timestamp
Learning
Bolt
shuffleGroup
userA, xxxxx
userB, xxxxx
userD, xxxxx
userB, xxxxx
userE, xxxxx
userC, xxxxx
userB, xxxxx
userC, xxxxx