Slides from my talk at the IEEE Conference on Visual Analytics Science and Technology (VAST) 2014 in Paris, France.
ABSTRACT
Logging user activities is essential to data analysis for internet products and services.
Twitter has built a unified logging infrastructure that captures user activities across all clients it owns, making it one of the largest datasets in the organization.
This paper describes challenges and opportunities in applying information visualization to log analysis at this massive scale, and shows how various visualization techniques can be adapted to help data scientists extract insights.
In particular, we focus on two scenarios:\ (1) monitoring and exploring a large collection of log events, and (2) performing visual funnel analysis on log data with tens of thousands of event types.
Two interactive visualizations were developed for these purposes:
we discuss design choices and the implementation of these systems, along with case studies of how they are being used in day-to-day operations at Twitter.
23. Log data
in Hadoop
Engineers & Data Scientists
billions of rows
24. Log data
in Hadoop
Aggregate
Client event collection
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
25. Log data
in Hadoop
Aggregate
Client event collection
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
(Who-to-Follow)
Engineers & Data Scientists
26. Log data
in Hadoop
Aggregate
Client event collection
Engineers & Data Scientists
27. Log data
in Hadoop
Aggregate
Client event collection
client page section component element action
Find
Search
Engineers & Data Scientists
28. Log data
in Hadoop
Aggregate
Client event collection
client page section component element action
Find
Search
Engineers & Data Scientists
30. Client event collection
Search
client page section component element action
Find
Log data
in Hadoop
Aggregate
web home * * * impression
Engineers & Data Scientists
31. Client event collection
Search
Query
client page section component element action
Find
Aggregate
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
web home * * * impression
Engineers & Data Scientists
32. Client event collection
Search
Query
client page section component element action
Find
Aggregate
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
search can be better
Engineers & Data Scientists
33. Client event collection
Search
Query
client page section component element action
Find
Aggregate
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
10,000+ event types
search can be better
Engineers & Data Scientists
34. Client event collection
10,000+ event types
What are all sections under web:home?
Search
Query
not everybody knows
client page section component element action
Find
Aggregate
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
search can be better
Engineers & Data Scientists
35. Client event collection
Search
Query
client page section component element action
Find
Aggregate
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
search can be better
one graph / event
10,000+ event types
not everybody knows
What are all sections under web:home?
Engineers & Data Scientists
36. Client event collection
Search
Query
client page section component element action
Find
Aggregate
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
search can be better
one graph / event
x 10,000
10,000+ event types
not everybody knows
What are all sections under web:home?
Engineers & Data Scientists
42. narrow down
See
Interactions
search box => filter
Client event collection
Engineers & Data Scientists
43. See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
Interactions
search box => filter
44. Interactions client : page : section : component : element : action
search box => filter
See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
60. Funnel analysis
home page
banana : home : - : - : - : impression
profile page search page
banana : profile : - : - : - : impression banana : search : - : - : - : impression
Specify all funnels manually!
n jobs
n hours
61. Goal
home page
banana : home : - : - : - : impression
… … …
1 job => all funnels, visualized
62. • Visualize an overview of event sequences
!
Related work
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
63. Related work
• Visualize an overview of event sequences
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
!
• Big data? eBay checkout sequences
[Shen et al. 2013]
!
One funnel at a time
Checkout > Payment > Confirm > Success
96. Final process
1. Define set of events
2. Pick alignment, direction and window size
3. Run Hadoop job (with more aggregation)
4. Wait for it… (2+ hrs)
5. Visualize
gazillion patterns (TBs)
~100,000 patterns (10MB)
98. Deployment
• Since Jan 2013
• Fewer users, but more in-depth ad-hoc analysis
• Initial meeting to provide support
99. Case studies
• What did users do when they visit Twitter? (in demo)
• Where did users give up in the sign up process?
• more in the paper
100. Case studies
click on “sign up”
fill personal info
import address book
etc.
• What did users do when they visit Twitter? (in demo)
• Where did users give up in the sign up process?
• more in the paper
101. Case studies
• What did users do when they visit Twitter? (in demo)
• Where did users give up in the sign up process?
• more in the paper
read the paper :)
103. Conclusions & Future work
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore
+ Anomaly detection & automatic alert
• Funnel Analysis
+ More interactivity & data / reduce wait time / latency study?
• Used in day-to-day operations at Twitter
104. Conclusions & Future work
Challenge
big data
small data
visualize & interact
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore
+ Anomaly detection & automatic alert
• Funnel Analysis
+ More interactivity & data / reduce wait time / latency study?
• Used in day-to-day operations at Twitter
aggregate
& sacrifice
105. Conclusions & Future work
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore
+ Anomaly detection & automatic alert
• Funnel Analysis
+ More interactivity & data / reduce wait time / latency study?
• Used in day-to-day operations at Twitter
• Generalize to smaller systems
Challenge
big data
aggregate
& sacrifice
small data
visualize & interact
106. Acknowledgement
• Data Scientists & Engineers @Twitter — Linus Lee, Chuang Liu
• Feedback from reviewers, Ben Shneiderman & Catherine Plaisant
107. Conclusions & Future work
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore
+ Anomaly detection & automatic alert
• Funnel Analysis
+ More interactivity & data / reduce wait time / latency study?
• Used in day-to-day operations at Twitter
• Generalize to smaller systems
Challenge
big data
aggregate
& sacrifice
small data
visualize & interact
kristw@twitter.com / @kristw