More Related Content Similar to Flexible Event Tracking (Paul Gebheim) Similar to Flexible Event Tracking (Paul Gebheim) (20) Flexible Event Tracking (Paul Gebheim)2. How can we effectively use our data to make
Justin.tv better?
3. Questions
Who does what, and how?
Funnels
How valuable are groups of users?
Virality
Are our changes working?
Retention, Funnel Conversion
4. The Dream
A general framework for creating, deploying, and analyzing A/B
tests in terms of Funnels, Virality, and Retention.
5. Backend Dreams
Flexibility
Queryability
Scalability
... and it should be easy to work with
6. Backend Dreams... come true
Schema-less!
Rich data access/manipulation toolset
At home in a web-centric toolchain
Sharding, Map/Reduce, Replication
8. Aggregating Data
Web Site Events
[{
"name": "front_page/broadcast_click",
"date": "2010-04-20 12:00:00-7000",
"unique_id": "fRx8zq",
"bucket": "big_red_button"
},
{
"name": "front_page/broadcast_click",
"date": "2010-04-20 12:01:00-7000",
"unique_id": "9aB8c2",
"bucket": "small_blue_button"
}]
9. Aggregating Data
Video System Events
[{
"name": "broadcast/started",
"date": "2010-04-20 12:10:00-7000",
"unique_id": "fRx8zq",
"bucket": "big_red_button",
"channel": "my_1337_ch4nn31l",
}]
10. Processing Data
Python
Map/Reduce
Configuration Documents
Generate/Apply MongoDB operations
12. Example
Historical Data with SQL:
1 select
2 event_name, bucket, count(*)
3 from
4 events
5 group by event_name, bucket;
13. Mongo can do that!
For small datasets, use collection.group()
1 var count_events_per_bucket = function() {
2 return db.events.group({
3 key: {name: 1, bucket: 1},
4 cond: {/* include all events */},
5 reduce: function(event, aggregate) {
6 aggregate.count += 1;
7 },
8 initial: {
9 count: 0
10 }
11 });
12 }
14. Mongo can do that!
For large datasets, use collection.mapReduce()
1 var count_events_per_bucket_big = function() {
2 var res = db.events.mapReduce(
3 // map
4 function() {
5 emit({
6 name: this.name,
7 bucket: this.bucket
8 }, 1);
9 },
10 // reduce
11 function(key, values_list) {
12 var count=0;
13 each(values_list, function(v,n) {
14 count += v;
15 });
16 return count;
17 }
18 );
19
20 return db[res.result].find();
21 };
15. Mongo can also...
be used to do the counting in real time!
1 matchers = {
2 "front_page/broadcast_click": lambda event: event["bucket"],
3 "broadcast/started": lambda event["bucket"]
4 }
5
6 for event in events:
7 key = event["name"]
8 if key in matchers:
9 count_key = "counts.%s.%s" % (
10 extractDay(event["date"]),
11 matchers[key](event))
12 event_db.event_counts.update(
13 {"_id": key},
14 {"$inc": {count_key: 1}},
15 multi=True, upsert=True)
16 event_db.events.insert(event)
17
16. Example
How the results appear in Mongo
1 > db.event_counts.find()
2 {
3 "_id": "front_page/broadcast_click",
4 "counts": {
5 "2010-04-20": {
6 "big_red_button": 1231,
7 "small_blue_button": 86
8 }
9 }
10 }
11 {
12 "_id": "broadcast/started",
13 "counts": {
14 "2010-04-20": {
15 "big_red_button": 72,
16 "small_blue_button": 6
17 }
18 }
19 }
20 >
17. What’s that we have there?
First, Click the “Broadcast Button”
Then, Start Broadcasting
18. We can add more events...
First, Click the “Broadcast Button”
Authenticate
Click flash “Allow” or Disallow” box
Share with friends
...
Then, Start Broadcasting
19. Periodic Map/Reduce
Computing a bunch of stuff every half hour is fine if its fast enough
A program can generate arbitrarily complex Map/Reduce code...
20. Accurate Funnel Calculation
• Per user rollup
– For each user, which steps in the funnel have they been
at with constraints applied
– A map to get unique users, a reduce to count which
unique events they triggered
• Per bucket rollup
– For each bucket, how many users at each ‘step’ in the
funnel
– Sum counts at each step per bucket
25. Future work
Migrating old Postgres-backed system to MongoDB
Real-time calculation for timeseries calculation
Batch jobs for Funnel, Retention, and Virality