SlideShare a Scribd company logo
1 of 28
Download to read offline
Snowplow Meetup
London, February 2017
2
busuu is the world’s leading social
network for language learning
Language courses Social network
+
• Access to native speakers
• Peer to peer text corrections
• High quality courses in 12 languages
• Beginner to advanced intermediate level
1 2
How does busuu work?
Most important
vocabulary
Key
grammar
Practice with
native speakers
Faster
fluency
busuu is a complete self-study and language practice environment
3
busuu 2016
What sort of data do we use?
● Front end tracking data
● Progress data
● Backend db data
● Third party data
Why did we
look at using
Snowplow
busuu 2016
Problems
My data says
X, why does
yours say Y?
Cloudwatch
Alert!
“Why can’t i find
the results of my
A/B test till
tomorrow again?
“Oh my god, do
we really have to
put yet another
tracker in?”
busuu 2016
Scalability
busuu 2016
Batch vs Real time
busuu 2016
Reconciliation
busuu 2016
Then we thought...
Can we use
snowplow
framework for
more than just
analytics?
busuu 2016
Too many SKDs and trackers
busuu 2016
Snowplow delivery
How do we get
Snowplow to deliver
the events to
everybody/thing that
needs it, instead of
adding more trackers
to the frontend
Tech Stack
busuu 2016
Data Collection Phase
Events Events Backend
Data
API Calls
Yet to be done
Scala Stream
Collector
busuu 2016
Processing
15
Stream
Enrich
Raw Data Enriched Data
busuu 2016
Processing
Validation
● Customised busuu event
schemas
● Different based on environment
Enrichments
● ip lookup
● Forex Conversion
busuu 2016
Distribution
17
Results
back to
App/SiteMachine Learning
Models
Yet to
be done
busuu 2016
Plug & Play Integrations
18
● One source of truth
● Scalability
● Third party systems can be added very quickly
busuu 2016
Lambda?
19
Parse through each
field of enriched
data looking for
custom schema
name
One lambda function
per type of data and
per integration
Relay required data to
third party service
through REST api or
given python client
Problem areas
20 busuu 2016
busuu 2016
Main implementation bugbears
1. Strict Multi Platform Schemas
2. Offline mode delay
3. Device vs Collector Timestamps
Future
Improvements
22 busuu 2016
busuu 2016
Future projects
● Live A/B test trains & results
● Live machine learning results in app
● Automated alerting on complex company metrics.
Thanks!
Bruce Pannaman
busuu 2016
Frontend event data
● Track and find issues in user behavior.
● Insight into product usage
● A/B Testing
● CRM cohorting
● In-app message cohorting
busuu 2016
Progress Data
● What has a user learnt?
● How is our content performing
● What is their language level?
● Vocabulary lists
busuu 2016
Backend Data
● What are the user’s attributes
● Social relationships (friends)
● Writing exercises and comments
busuu 2016
Third Party
● Payments
● CRM performance
● App store metadata (review etc.)
● PPC data

More Related Content

Similar to Snowplow at the heart of Busuu's data & analytics infrastructure

Open Day - September 2016
Open Day - September 2016Open Day - September 2016
Open Day - September 2016Neil Lasrado
 
Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionSammy Fung
 
SharePoint 2016 & Office 365: A Look Ahead To What's Coming - SPS Vancouver
SharePoint 2016 & Office 365: A Look Ahead To What's Coming - SPS VancouverSharePoint 2016 & Office 365: A Look Ahead To What's Coming - SPS Vancouver
SharePoint 2016 & Office 365: A Look Ahead To What's Coming - SPS VancouverRichard Harbridge
 
Google summer of code with drupal
Google summer of code with drupalGoogle summer of code with drupal
Google summer of code with drupalNaveen Valecha
 
Sakshi sharma resume
Sakshi sharma resumeSakshi sharma resume
Sakshi sharma resumeSakshi Sharma
 
Bringing the Transcript to Life
Bringing the Transcript to Life  Bringing the Transcript to Life
Bringing the Transcript to Life Jonathan Mott
 
Learning Engineering Initiatives at Harvard DCE
Learning Engineering Initiatives at Harvard DCELearning Engineering Initiatives at Harvard DCE
Learning Engineering Initiatives at Harvard DCEJay Luker
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...Together We're Better
 
Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Christophe Guéret
 
Mapping presentation THAG big data from space
Mapping presentation THAG big data from spaceMapping presentation THAG big data from space
Mapping presentation THAG big data from spaceBartosz Szkudlarek
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Frappe Open Day - March 2018
Frappe Open Day - March 2018Frappe Open Day - March 2018
Frappe Open Day - March 2018Kenneth Sequeira
 
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017Rendy Bambang Junior
 
Data as a service
Data as a serviceData as a service
Data as a serviceZoltan Nagy
 

Similar to Snowplow at the heart of Busuu's data & analytics infrastructure (20)

Benchmarking Linked Data Introductory Remarks
Benchmarking Linked Data Introductory RemarksBenchmarking Linked Data Introductory Remarks
Benchmarking Linked Data Introductory Remarks
 
Open Day - September 2016
Open Day - September 2016Open Day - September 2016
Open Day - September 2016
 
Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell Extension
 
SharePoint 2016 & Office 365: A Look Ahead To What's Coming - SPS Vancouver
SharePoint 2016 & Office 365: A Look Ahead To What's Coming - SPS VancouverSharePoint 2016 & Office 365: A Look Ahead To What's Coming - SPS Vancouver
SharePoint 2016 & Office 365: A Look Ahead To What's Coming - SPS Vancouver
 
Hobbit in a Nutshell - EDF2016
Hobbit in a Nutshell - EDF2016Hobbit in a Nutshell - EDF2016
Hobbit in a Nutshell - EDF2016
 
Google summer of code with drupal
Google summer of code with drupalGoogle summer of code with drupal
Google summer of code with drupal
 
Frappe Open Day - February 2017
Frappe Open Day - February 2017Frappe Open Day - February 2017
Frappe Open Day - February 2017
 
Sakshi sharma resume
Sakshi sharma resumeSakshi sharma resume
Sakshi sharma resume
 
Bringing the Transcript to Life
Bringing the Transcript to Life  Bringing the Transcript to Life
Bringing the Transcript to Life
 
Learning Engineering Initiatives at Harvard DCE
Learning Engineering Initiatives at Harvard DCELearning Engineering Initiatives at Harvard DCE
Learning Engineering Initiatives at Harvard DCE
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
 
Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...
 
Mapping presentation THAG big data from space
Mapping presentation THAG big data from spaceMapping presentation THAG big data from space
Mapping presentation THAG big data from space
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Frappe Open Day - March 2018
Frappe Open Day - March 2018Frappe Open Day - March 2018
Frappe Open Day - March 2018
 
Frappe Open Day - March 2018
Frappe Open Day - March 2018Frappe Open Day - March 2018
Frappe Open Day - March 2018
 
Maimoona g so-c - 2021
Maimoona   g so-c - 2021Maimoona   g so-c - 2021
Maimoona g so-c - 2021
 
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
 
Data as a service
Data as a serviceData as a service
Data as a service
 

Recently uploaded

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 

Recently uploaded (20)

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 

Snowplow at the heart of Busuu's data & analytics infrastructure