In this session, you will learn how to designed clickstream analytics application and how you can use the same architecture to build your own and be ready to handle the changing world of clickstream data. Dive into how to perform advanced user retention and cohort analysis to make near–real time product and marketing decisions. Learn how to build infrastructure that is fast, easy, and cost-effective with AWS resources such as Amazon Kinesis, Spark on Amazon EMR, Amazon S3, Amazon Redshift, and Amazon Elasticsearch.
13. Data source
• Page
• Click event
• Web log
• Thing event
Use case
Answer
• User retention
• High spending customer
navigation pattern
• User segmentation
• UX improvement
• What deal/ad to try
next
14. Requirement
Ingest
• Scalability
• Raw data
• Low running cost
Analyze
• Full visibility
• Data without sampling
• Data from device in any
form factor
• Flexibility
• Join with different
datasets
16. @ 100km/s
Ingest Store Process
JavaScript
(Snowplow)
AWS SDK
LOG4J
Flume
Fluentd
HTTP Post
Amazon
Lambda
Amazon
S3
Amazon
Kinesis
API Server Streaming
Buffer
24hrs-7days
Compute Storage
Store
Web
Servers
50k rps
17. @ 100km/s
Ingest Store Process
JavaScript
(Snowplow)
AWS SDK
LOG4J
Flume
Fluentd
HTTP Post
Amazon
Lambda
Amazon
S3
Amazon
Kinesis
API Gateway
API Server Streaming
Buffer
24hrs-7days
Compute Storage
Store
50k rps
22. User retention and growth
0
1000
2000
3000
4000
5000
6000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
DailyActiveUsers
Product Age (days)
Product A
Product B
23. High churn = wasted ad dollars
$-
$5,000.00
$10,000.00
$15,000.00
$20,000.00
$25,000.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Product age (days)
Product A
Product B
24.
25. BUSINESS MEDIA
operates more than 20 business-to-businesses with significant holdings in the
automotive, electronic, medical and finance industries
MAGAZINES
publishes 20 U.S. titles and close to 300 international editions
BROADCASTING
comprises 31 television and two radio stations
NEWSPAPERS
owns 15 daily and 34 weekly newspapers
Hearst includes over 200 businesses in over
100 countries around the world
36. Why do we need machine learning for this?
The social media stream is high-volume, and most of the
messages are not CS-actionable
37. Logstash
AWS SDK
Ingest Store
Bot AWS SDK
App
Crawlers
AWS SDK
Amazon
Kinesis
Process
Amazon
Lambda
Analysts
AWS SDK
Machine
learning
Notification
Action
Support
issue
Database
Feature
request
Keep training the ML model with new data
Action
Amazon S3
38. AWS SDK
Ingest Store
Bot AWS SDK
Messenger
Amazon
Kinesis
Process
Amazon
Lambda
Analysts
Machine
learning
Action
Bot
App
Get prediction
Keep training the ML model with new data
Amazon S3
45. Sushiro – Real-time streaming & analysis
Real-time data ingested by Amazon Kinesis is analyzed in Amazon Redshift
380 stores stream live data from
Sushi plates
Inventory information combined
with consumption information
near real-time
Forecast demand by store,
minimize food waste, and
improve efficiencies
Amazon
46. Source DBs
3rd Party Data
Log Data
Reporting
Analysis
Processing
Data Lake
S3
Source of truth