SlideShare a Scribd company logo
1 of 25
Flexible Event Logging

Analyzing Funnels, Retention, and
  Viral Spread with MongoDB


       Paul Gebheim - Justin.tv
How can we effectively use our data to make
            Justin.tv better?
Questions

   Who does what, and how?
          Funnels


How valuable are groups of users?
             Virality


   Are our changes working?
  Retention, Funnel Conversion
The Dream




A general framework for creating, deploying, and analyzing A/B
      tests in terms of Funnels, Virality, and Retention.
Backend Dreams


              Flexibility

            Queryability

             Scalability

... and it should be easy to work with
Backend Dreams... come true


             Schema-less!

  Rich data access/manipulation toolset

   At home in a web-centric toolchain

   Sharding, Map/Reduce, Replication
Lets build it...
Aggregating Data
              Web Site Events

     [{
      "name": "front_page/broadcast_click",
      "date": "2010-04-20 12:00:00-7000",
      "unique_id": "fRx8zq",
      "bucket": "big_red_button"
},
{
      "name": "front_page/broadcast_click",
      "date": "2010-04-20 12:01:00-7000",
      "unique_id": "9aB8c2",
      "bucket": "small_blue_button"
}]
Aggregating Data
            Video System Events



[{
      "name": "broadcast/started",
      "date": "2010-04-20 12:10:00-7000",
      "unique_id": "fRx8zq",
      "bucket": "big_red_button",
      "channel": "my_1337_ch4nn31l",
}]
Processing Data


             Python

          Map/Reduce

     Configuration Documents

Generate/Apply MongoDB operations
Example:




Count how many times each event occurs per 'bucket'
Example
        Historical Data with SQL:




1   select
2       event_name, bucket, count(*)
3   from
4       events
5   group by event_name, bucket;
Mongo can do that!
 For small datasets, use collection.group()

 1   var count_events_per_bucket = function() {
 2       return db.events.group({
 3           key: {name: 1, bucket: 1},
 4           cond: {/* include all events */},
 5           reduce: function(event, aggregate) {
 6               aggregate.count += 1;
 7           },
 8           initial: {
 9               count: 0
10           }
11       });
12   }
Mongo can do that!
For large datasets, use collection.mapReduce()
 1   var count_events_per_bucket_big = function() {
 2       var res = db.events.mapReduce(
 3           // map
 4           function() {
 5               emit({
 6                   name: this.name,
 7                   bucket: this.bucket
 8               }, 1);
 9           },
10           // reduce
11           function(key, values_list) {
12               var count=0;
13               each(values_list, function(v,n) {
14                   count += v;
15               });
16               return count;
17           }
18       );
19
20       return db[res.result].find();
21   };
Mongo can also...
              be used to do the counting in real time!

 1   matchers = {
 2        "front_page/broadcast_click": lambda event: event["bucket"],
 3        "broadcast/started": lambda event["bucket"]
 4   }
 5
 6   for event in events:
 7       key = event["name"]
 8       if key in matchers:
 9           count_key = "counts.%s.%s" % (
10                           extractDay(event["date"]),
11                           matchers[key](event))
12           event_db.event_counts.update(
13                   {"_id": key},
14                   {"$inc": {count_key: 1}},
15                   multi=True, upsert=True)
16       event_db.events.insert(event)
17
Example
       How the results appear in Mongo
 1   > db.event_counts.find()
 2   {
 3       "_id": "front_page/broadcast_click",
 4       "counts": {
 5           "2010-04-20": {
 6               "big_red_button": 1231,
 7               "small_blue_button": 86
 8           }
 9       }
10   }
11   {
12       "_id": "broadcast/started",
13       "counts": {
14           "2010-04-20": {
15               "big_red_button": 72,
16               "small_blue_button": 6
17           }
18       }
19   }
20   >
What’s that we have there?



  First, Click the “Broadcast Button”

      Then, Start Broadcasting
We can add more events...

  First, Click the “Broadcast Button”

            Authenticate

  Click flash “Allow” or Disallow” box

         Share with friends

                   ...

     Then, Start Broadcasting
Periodic Map/Reduce



Computing a bunch of stuff every half hour is fine if its fast enough

 A program can generate arbitrarily complex Map/Reduce code...
Accurate Funnel Calculation

• Per user rollup
   – For each user, which steps in the funnel have they been
     at with constraints applied
   – A map to get unique users, a reduce to count which
     unique events they triggered
• Per bucket rollup
   – For each bucket, how many users at each ‘step’ in the
     funnel
   – Sum counts at each step per bucket
Same strategy...




All calculations ended up being done in batch jobs...
Thoughts...




Interactive performance poor during M/R jobs

      Eliot says this is fixed in 1.5.0 :-)
Thoughts...




Even so... its fast enough!
Future work



Migrating old Postgres-backed system to MongoDB

  Real-time calculation for timeseries calculation

   Batch jobs for Funnel, Retention, and Virality

More Related Content

Similar to Flexible Event Tracking (Paul Gebheim)

CQRS and Event Sourcing with MongoDB and PHP
CQRS and Event Sourcing with MongoDB and PHPCQRS and Event Sourcing with MongoDB and PHP
CQRS and Event Sourcing with MongoDB and PHPDavide Bellettini
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabRoman
 
Implementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBImplementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBMongoDB
 
Event Visualization with OpenStreetMap Data, Interdisciplinary Project
Event Visualization with OpenStreetMap Data, Interdisciplinary ProjectEvent Visualization with OpenStreetMap Data, Interdisciplinary Project
Event Visualization with OpenStreetMap Data, Interdisciplinary ProjectBibek Shrestha
 
Ben ford intro
Ben ford introBen ford intro
Ben ford introPuppet
 
Telemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordTelemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordPuppet
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdfsash236
 
Budapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyBudapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyMészáros József
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBTakahiro Inoue
 
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Dan Robinson
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarMongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Eric Sammer
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
 
E.D.D.I - Open Source Chatbot Platform
E.D.D.I - Open Source Chatbot PlatformE.D.D.I - Open Source Chatbot Platform
E.D.D.I - Open Source Chatbot PlatformGregor Jarisch
 
Advanced kapacitor
Advanced kapacitorAdvanced kapacitor
Advanced kapacitorInfluxData
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubySATOSHI TAGOMORI
 
Startup Safary | Fight against robots with enbrite.ly data platform
Startup Safary | Fight against robots with enbrite.ly data platformStartup Safary | Fight against robots with enbrite.ly data platform
Startup Safary | Fight against robots with enbrite.ly data platformMészáros József
 

Similar to Flexible Event Tracking (Paul Gebheim) (20)

CQRS and Event Sourcing with MongoDB and PHP
CQRS and Event Sourcing with MongoDB and PHPCQRS and Event Sourcing with MongoDB and PHP
CQRS and Event Sourcing with MongoDB and PHP
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at Grab
 
Measuring Continuity
Measuring ContinuityMeasuring Continuity
Measuring Continuity
 
Implementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBImplementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDB
 
Event Visualization with OpenStreetMap Data, Interdisciplinary Project
Event Visualization with OpenStreetMap Data, Interdisciplinary ProjectEvent Visualization with OpenStreetMap Data, Interdisciplinary Project
Event Visualization with OpenStreetMap Data, Interdisciplinary Project
 
Ben ford intro
Ben ford introBen ford intro
Ben ford intro
 
Telemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordTelemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben Ford
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
 
Budapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyBudapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.ly
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
E.D.D.I - Open Source Chatbot Platform
E.D.D.I - Open Source Chatbot PlatformE.D.D.I - Open Source Chatbot Platform
E.D.D.I - Open Source Chatbot Platform
 
Advanced kapacitor
Advanced kapacitorAdvanced kapacitor
Advanced kapacitor
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
 
Startup Safary | Fight against robots with enbrite.ly data platform
Startup Safary | Fight against robots with enbrite.ly data platformStartup Safary | Fight against robots with enbrite.ly data platform
Startup Safary | Fight against robots with enbrite.ly data platform
 

More from MongoSF

Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) MongoSF
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)MongoSF
 
C# Development (Sam Corder)
C# Development (Sam Corder)C# Development (Sam Corder)
C# Development (Sam Corder)MongoSF
 
Administration (Eliot Horowitz)
Administration (Eliot Horowitz)Administration (Eliot Horowitz)
Administration (Eliot Horowitz)MongoSF
 
Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)MongoSF
 
MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)MongoSF
 
Administration
AdministrationAdministration
AdministrationMongoSF
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)MongoSF
 
Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)MongoSF
 
Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)MongoSF
 
Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)MongoSF
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)MongoSF
 
MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)MongoSF
 
Zero to Mongo in 60 Hours
Zero to Mongo in 60 HoursZero to Mongo in 60 Hours
Zero to Mongo in 60 HoursMongoSF
 
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)MongoSF
 
PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)MongoSF
 
Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)MongoSF
 
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...MongoSF
 
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)MongoSF
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)MongoSF
 

More from MongoSF (20)

Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases) 
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)
 
C# Development (Sam Corder)
C# Development (Sam Corder)C# Development (Sam Corder)
C# Development (Sam Corder)
 
Administration (Eliot Horowitz)
Administration (Eliot Horowitz)Administration (Eliot Horowitz)
Administration (Eliot Horowitz)
 
Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)
 
MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)
 
Administration
AdministrationAdministration
Administration
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)
 
Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)
 
Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
 
MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)
 
Zero to Mongo in 60 Hours
Zero to Mongo in 60 HoursZero to Mongo in 60 Hours
Zero to Mongo in 60 Hours
 
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
 
PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)
 
Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)
 
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
 
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
 

Recently uploaded

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Recently uploaded (20)

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Flexible Event Tracking (Paul Gebheim)

  • 1. Flexible Event Logging Analyzing Funnels, Retention, and Viral Spread with MongoDB Paul Gebheim - Justin.tv
  • 2. How can we effectively use our data to make Justin.tv better?
  • 3. Questions Who does what, and how? Funnels How valuable are groups of users? Virality Are our changes working? Retention, Funnel Conversion
  • 4. The Dream A general framework for creating, deploying, and analyzing A/B tests in terms of Funnels, Virality, and Retention.
  • 5. Backend Dreams Flexibility Queryability Scalability ... and it should be easy to work with
  • 6. Backend Dreams... come true Schema-less! Rich data access/manipulation toolset At home in a web-centric toolchain Sharding, Map/Reduce, Replication
  • 8. Aggregating Data Web Site Events [{     "name": "front_page/broadcast_click",     "date": "2010-04-20 12:00:00-7000",     "unique_id": "fRx8zq",     "bucket": "big_red_button" }, {     "name": "front_page/broadcast_click",     "date": "2010-04-20 12:01:00-7000",     "unique_id": "9aB8c2",     "bucket": "small_blue_button" }]
  • 9. Aggregating Data Video System Events [{     "name": "broadcast/started",     "date": "2010-04-20 12:10:00-7000",     "unique_id": "fRx8zq",     "bucket": "big_red_button",     "channel": "my_1337_ch4nn31l", }]
  • 10. Processing Data Python Map/Reduce Configuration Documents Generate/Apply MongoDB operations
  • 11. Example: Count how many times each event occurs per 'bucket'
  • 12. Example Historical Data with SQL: 1 select 2     event_name, bucket, count(*) 3 from 4     events 5 group by event_name, bucket;
  • 13. Mongo can do that! For small datasets, use collection.group()  1 var count_events_per_bucket = function() {  2     return db.events.group({  3         key: {name: 1, bucket: 1},  4         cond: {/* include all events */},  5         reduce: function(event, aggregate) {  6             aggregate.count += 1;  7         },  8         initial: {  9             count: 0 10         } 11     }); 12 }
  • 14. Mongo can do that! For large datasets, use collection.mapReduce()  1 var count_events_per_bucket_big = function() {  2     var res = db.events.mapReduce(  3         // map  4         function() {  5             emit({  6                 name: this.name,  7                 bucket: this.bucket  8             }, 1);  9         }, 10         // reduce 11         function(key, values_list) { 12             var count=0; 13             each(values_list, function(v,n) { 14                 count += v; 15 }); 16             return count; 17         } 18     ); 19 20     return db[res.result].find(); 21 };
  • 15. Mongo can also... be used to do the counting in real time!  1 matchers = {  2      "front_page/broadcast_click": lambda event: event["bucket"],  3      "broadcast/started": lambda event["bucket"]  4 }  5  6 for event in events:  7     key = event["name"]  8     if key in matchers:  9         count_key = "counts.%s.%s" % ( 10                         extractDay(event["date"]), 11                         matchers[key](event)) 12         event_db.event_counts.update( 13                 {"_id": key}, 14                 {"$inc": {count_key: 1}}, 15                 multi=True, upsert=True) 16     event_db.events.insert(event) 17
  • 16. Example How the results appear in Mongo  1 > db.event_counts.find()  2 {  3     "_id": "front_page/broadcast_click",  4     "counts": {  5         "2010-04-20": {  6             "big_red_button": 1231,  7             "small_blue_button": 86  8         }  9     } 10 } 11 { 12     "_id": "broadcast/started", 13     "counts": { 14         "2010-04-20": { 15             "big_red_button": 72, 16             "small_blue_button": 6 17         } 18     } 19 } 20 >
  • 17. What’s that we have there? First, Click the “Broadcast Button” Then, Start Broadcasting
  • 18. We can add more events... First, Click the “Broadcast Button” Authenticate Click flash “Allow” or Disallow” box Share with friends ... Then, Start Broadcasting
  • 19. Periodic Map/Reduce Computing a bunch of stuff every half hour is fine if its fast enough A program can generate arbitrarily complex Map/Reduce code...
  • 20. Accurate Funnel Calculation • Per user rollup – For each user, which steps in the funnel have they been at with constraints applied – A map to get unique users, a reduce to count which unique events they triggered • Per bucket rollup – For each bucket, how many users at each ‘step’ in the funnel – Sum counts at each step per bucket
  • 21.
  • 22. Same strategy... All calculations ended up being done in batch jobs...
  • 23. Thoughts... Interactive performance poor during M/R jobs Eliot says this is fixed in 1.5.0 :-)
  • 25. Future work Migrating old Postgres-backed system to MongoDB Real-time calculation for timeseries calculation Batch jobs for Funnel, Retention, and Virality

Editor's Notes