SlideShare a Scribd company logo
1 of 15
SNOWPLOW AND LOOKER
AT OYSTER.COM
SNOWPLOW MEETUP NYC – MARCH 30, 2016
BEN HOYT, DEVON POHL
WHAT IS OYSTER.COM?
• “The Hotel Tell-All”
• Authentic hotel reviews and
photos
• We visit every hotel in person
• 1000 hotels per month
• 7M high-res photos
• 100k 360° panoramas
(SOME OF) OUR TECH STACK
• Python to run our backend: web, scripting, photo processing, ETL
• PostgreSQL for all content data (eg: hotels, metadata for 12M images)
• Amazon S3 for image storage, EC2 spot instances for photo processing
• Amazon Redshift for analytics and reporting data
• Looker for reporting and visualizations
• for analytics tracking and analytics ETL
GOOGLE ANALYTICS V. SNOWPLOW
Google Analytics
• Good for web, but little control and flexibility
• Hard to get data out of (your data!)
• Crazy pricing model ($0 for free tier, or $150,000/y for premium)
• Can only do web analytics, not other business reporting
Snowplow
• Free and open source, with great support and paid tiers
• Puts data into a standard, easily-queryable database (Redshift)
• Focuses on tracking and analytics ETL and does that part well
WHY & HOW WE SWITCHED (1 YEAR AGO)
• We were considering Looker for reporting and visualization
• Looker rep: “majority of our customers use Snowplow to collect their data”
• We dug into Snowplow and liked what we saw
• Initially the design felt a bit overkill, but it’s definitely built to scale
• We implemented the tracking and pipeline, and haven’t looked back
OUR CONTEXT SCHEMA
• We use one “custom fields” schema to rule them all
• Simple, one table, one SQL join gives us all our custom fields
{
"self": {"name": "custom_fields", "vendor": "com.oyster", "version": "1-0-9"},
"properties": {
"page_type": {"type": "string"},
"page_subtype": {"type": "string"},
"template_type": {"type": "string", "enum": ["desktop", "mobile"]},
"hotel_id": {"$ref": "#/definitions/positiveInteger32"},
"account_id": {"$ref": "#/definitions/positiveInteger32"},
"ab_cell": {"type": "integer", "minimum": 1, "maximum": 20},
"checkin_date": {"type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"},
...
OUR DATASET
• A large, though not a massive, dataset
• Redshift cluster: 6 dc1.large SSD nodes, ~1TB storage
• 640 million rows in our events table
• We add 1.5 million event rows per day
• We copy (a subset of) our PostgreSQL content database into
Redshift nightly
• Enables business reporting and advanced content-based queries
PAGE
TRACKING
EXAMPLE
ANALYTICS AND LOOKER (DEVON POHL)
REPORTING
• Snowplow and content data are merged to provide insights into:
• Product
• A/B testing
• Funnel mapping
• Marketing
• SEO monitoring
• Ad Campaigns
• Operations
• Workflow Optimization
• ROI Modeling
• Business Trends
• Traffic
• Revenue
VISIT TABLE
• Event data is large and granular – often hard to digest
• Most valuable pre-processing we do is building the visit table
• Incremental build Python ETL run on Redshift
• This is key to most of our reporting infrastructure
• Combines events, custom fields data
• This visit table:
• Is user and user-session-ID granular
• Includes counts of a variety of event types
• Includes all information associated with first event of a visit
• A/B testing cells
• Referral information
• Etc.
LOOKER
• Looker is our core data exploration and reporting tool
• Web-based YAML + visualization wrapper on Redshift
• Enables non-technical business owners self-serve reporting and explore
• Used for other pre-processing via persistent derived tables (PDTs)
• PDTs are temporary tables built and managed by Looker defined by a query
• Good for small-to-medium size pre-processing
• Applications include de-duping and revenue attribution
DASHBOARDS /
SAVED REPORTS
EXPLORATION
OYSTER.COM
The Hotel Tell-All

More Related Content

What's hot

Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessGiuseppe Gaviani
 
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...Big Data Spain
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSGiuseppe Gaviani
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againAlexander Dean
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureDataWorks Summit/Hadoop Summit
 
Clickstream Analysis With Apache Spark
Clickstream Analysis With Apache SparkClickstream Analysis With Apache Spark
Clickstream Analysis With Apache SparkAndreas Zitzelsberger
 
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0WSO2
 
MongoDB .local Munich 2019: MongoDB Atlas Auto-Scaling
MongoDB .local Munich 2019: MongoDB Atlas Auto-ScalingMongoDB .local Munich 2019: MongoDB Atlas Auto-Scaling
MongoDB .local Munich 2019: MongoDB Atlas Auto-ScalingMongoDB
 
Snowplow, Metail and Cascalog
Snowplow, Metail and CascalogSnowplow, Metail and Cascalog
Snowplow, Metail and CascalogRobert Boland
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, HadoopMonitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, HadoopSenthil Pandurangan
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & ArchitectureAnjani Phuyal
 
Simply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution AnalysisSimply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution AnalysisStewart Duncan
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
 
Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...Amazon Web Services
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsSriskandarajah Suhothayan
 

What's hot (20)

Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your business
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWS
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back again
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data Architecture
 
Clickstream Analysis With Apache Spark
Clickstream Analysis With Apache SparkClickstream Analysis With Apache Spark
Clickstream Analysis With Apache Spark
 
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
 
MongoDB .local Munich 2019: MongoDB Atlas Auto-Scaling
MongoDB .local Munich 2019: MongoDB Atlas Auto-ScalingMongoDB .local Munich 2019: MongoDB Atlas Auto-Scaling
MongoDB .local Munich 2019: MongoDB Atlas Auto-Scaling
 
Snowplow, Metail and Cascalog
Snowplow, Metail and CascalogSnowplow, Metail and Cascalog
Snowplow, Metail and Cascalog
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, HadoopMonitoring @ scale over diverse data sources @ PayPal  - Druid, TSDB, Hadoop
Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & Architecture
 
Simply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution AnalysisSimply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution Analysis
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
 
Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 

Viewers also liked

Snowplow at Sigfig
Snowplow at SigfigSnowplow at Sigfig
Snowplow at Sigfigyalisassoon
 
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015yalisassoon
 
Snowplow is at the core of everything we do
Snowplow is at the core of everything we doSnowplow is at the core of everything we do
Snowplow is at the core of everything we doyalisassoon
 
Modelling event data in look ml
Modelling event data in look mlModelling event data in look ml
Modelling event data in look mlyalisassoon
 
Using Snowplow for A/B testing and user journey analysis at CustomMade
Using Snowplow for A/B testing and user journey analysis at CustomMadeUsing Snowplow for A/B testing and user journey analysis at CustomMade
Using Snowplow for A/B testing and user journey analysis at CustomMadeyalisassoon
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016yalisassoon
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...yalisassoon
 
Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...yalisassoon
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modelingyalisassoon
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
 
Meet Looker 4
Meet Looker 4Meet Looker 4
Meet Looker 4Looker
 
Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2yalisassoon
 
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...yalisassoon
 
Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...yalisassoon
 
Modeling event data
Modeling event dataModeling event data
Modeling event datayalisassoon
 
Tarjetas de presentacion
Tarjetas de presentacionTarjetas de presentacion
Tarjetas de presentacionANDRES FERIA C
 
Towards the development of smart agriculture infrastructure in Wielkopolska r...
Towards the development of smart agriculture infrastructure in Wielkopolska r...Towards the development of smart agriculture infrastructure in Wielkopolska r...
Towards the development of smart agriculture infrastructure in Wielkopolska r...FOODIE_Project
 

Viewers also liked (17)

Snowplow at Sigfig
Snowplow at SigfigSnowplow at Sigfig
Snowplow at Sigfig
 
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
 
Snowplow is at the core of everything we do
Snowplow is at the core of everything we doSnowplow is at the core of everything we do
Snowplow is at the core of everything we do
 
Modelling event data in look ml
Modelling event data in look mlModelling event data in look ml
Modelling event data in look ml
 
Using Snowplow for A/B testing and user journey analysis at CustomMade
Using Snowplow for A/B testing and user journey analysis at CustomMadeUsing Snowplow for A/B testing and user journey analysis at CustomMade
Using Snowplow for A/B testing and user journey analysis at CustomMade
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
 
Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
Meet Looker 4
Meet Looker 4Meet Looker 4
Meet Looker 4
 
Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2
 
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
 
Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...
 
Modeling event data
Modeling event dataModeling event data
Modeling event data
 
Tarjetas de presentacion
Tarjetas de presentacionTarjetas de presentacion
Tarjetas de presentacion
 
Towards the development of smart agriculture infrastructure in Wielkopolska r...
Towards the development of smart agriculture infrastructure in Wielkopolska r...Towards the development of smart agriculture infrastructure in Wielkopolska r...
Towards the development of smart agriculture infrastructure in Wielkopolska r...
 

Similar to Snowplow Analytics and Looker at Oyster.com

PyData Berlin Meetup
PyData Berlin MeetupPyData Berlin Meetup
PyData Berlin MeetupSteffen Wenz
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBantoinegirbal
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introductionantoinegirbal
 
Introducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No ProblemIntroducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No ProblemAndrew Liu
 
Semi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented DatabasesSemi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented DatabasesDaniel Coupal
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhenDavid Peyruc
 
Building Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesBuilding Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesMapR Technologies
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleMapR Technologies
 
Modeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databasesModeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databasesRyan CrawCour
 
Big Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBig Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBigDataExpo
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAdrian Hornsby
 
Inferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsInferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsDiego Sevilla Ruiz
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishMongoDB
 
NoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsNoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsMichael Mior
 
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant IdeasMongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant IdeasMongoDB
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioRandall Hunt
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on ReadKent Graziano
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...Amazon Web Services
 
MongoDB at ZPUGDC
MongoDB at ZPUGDCMongoDB at ZPUGDC
MongoDB at ZPUGDCMike Dirolf
 

Similar to Snowplow Analytics and Looker at Oyster.com (20)

PyData Berlin Meetup
PyData Berlin MeetupPyData Berlin Meetup
PyData Berlin Meetup
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction
 
Introducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No ProblemIntroducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No Problem
 
Semi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented DatabasesSemi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented Databases
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
Building Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesBuilding Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query Engines
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scale
 
Modeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databasesModeling JSON data for NoSQL document databases
Modeling JSON data for NoSQL document databases
 
Big Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBig Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It Happens
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Inferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsInferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its Applications
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
NoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsNoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL Applications
 
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant IdeasMongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
 
MongoDB at ZPUGDC
MongoDB at ZPUGDCMongoDB at ZPUGDC
MongoDB at ZPUGDC
 

Recently uploaded

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 

Recently uploaded (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 

Snowplow Analytics and Looker at Oyster.com

  • 1. SNOWPLOW AND LOOKER AT OYSTER.COM SNOWPLOW MEETUP NYC – MARCH 30, 2016 BEN HOYT, DEVON POHL
  • 2. WHAT IS OYSTER.COM? • “The Hotel Tell-All” • Authentic hotel reviews and photos • We visit every hotel in person • 1000 hotels per month • 7M high-res photos • 100k 360° panoramas
  • 3. (SOME OF) OUR TECH STACK • Python to run our backend: web, scripting, photo processing, ETL • PostgreSQL for all content data (eg: hotels, metadata for 12M images) • Amazon S3 for image storage, EC2 spot instances for photo processing • Amazon Redshift for analytics and reporting data • Looker for reporting and visualizations • for analytics tracking and analytics ETL
  • 4. GOOGLE ANALYTICS V. SNOWPLOW Google Analytics • Good for web, but little control and flexibility • Hard to get data out of (your data!) • Crazy pricing model ($0 for free tier, or $150,000/y for premium) • Can only do web analytics, not other business reporting Snowplow • Free and open source, with great support and paid tiers • Puts data into a standard, easily-queryable database (Redshift) • Focuses on tracking and analytics ETL and does that part well
  • 5. WHY & HOW WE SWITCHED (1 YEAR AGO) • We were considering Looker for reporting and visualization • Looker rep: “majority of our customers use Snowplow to collect their data” • We dug into Snowplow and liked what we saw • Initially the design felt a bit overkill, but it’s definitely built to scale • We implemented the tracking and pipeline, and haven’t looked back
  • 6. OUR CONTEXT SCHEMA • We use one “custom fields” schema to rule them all • Simple, one table, one SQL join gives us all our custom fields { "self": {"name": "custom_fields", "vendor": "com.oyster", "version": "1-0-9"}, "properties": { "page_type": {"type": "string"}, "page_subtype": {"type": "string"}, "template_type": {"type": "string", "enum": ["desktop", "mobile"]}, "hotel_id": {"$ref": "#/definitions/positiveInteger32"}, "account_id": {"$ref": "#/definitions/positiveInteger32"}, "ab_cell": {"type": "integer", "minimum": 1, "maximum": 20}, "checkin_date": {"type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"}, ...
  • 7. OUR DATASET • A large, though not a massive, dataset • Redshift cluster: 6 dc1.large SSD nodes, ~1TB storage • 640 million rows in our events table • We add 1.5 million event rows per day • We copy (a subset of) our PostgreSQL content database into Redshift nightly • Enables business reporting and advanced content-based queries
  • 9. ANALYTICS AND LOOKER (DEVON POHL)
  • 10. REPORTING • Snowplow and content data are merged to provide insights into: • Product • A/B testing • Funnel mapping • Marketing • SEO monitoring • Ad Campaigns • Operations • Workflow Optimization • ROI Modeling • Business Trends • Traffic • Revenue
  • 11. VISIT TABLE • Event data is large and granular – often hard to digest • Most valuable pre-processing we do is building the visit table • Incremental build Python ETL run on Redshift • This is key to most of our reporting infrastructure • Combines events, custom fields data • This visit table: • Is user and user-session-ID granular • Includes counts of a variety of event types • Includes all information associated with first event of a visit • A/B testing cells • Referral information • Etc.
  • 12. LOOKER • Looker is our core data exploration and reporting tool • Web-based YAML + visualization wrapper on Redshift • Enables non-technical business owners self-serve reporting and explore • Used for other pre-processing via persistent derived tables (PDTs) • PDTs are temporary tables built and managed by Looker defined by a query • Good for small-to-medium size pre-processing • Applications include de-duping and revenue attribution

Editor's Notes

  1. Hi – I’m Devon. I work with Oyster.com and Jetsetter sites at Tripadvisor. Just after I joined, Oyster implemented Snowplow and I’ve spent much of the last year using the platform to build reporting and analytics.
  2. We use snowplow data for a wide variety things. With snowplow we learn about: Business health – we have dashboards and daily updates on traffic, revenue and other metrics. We use Looker for most of this, which I’ll talk about more in a moment Product – We’ve learned about our site and users through A/B testing and site traffic mapping Marketing – Snowplow allows us to monitor and analyze marketing efforts, including SEO Operations – We monitor how existing assets are performing and model how prospective assets would likely perform to prioritize work for ops and editorial teams A clear and intuitive context schema is key to marrying snowplow to other data sources – thanks Ben!
  3. The event-level tables can be difficult to use in it’s raw form. We tend to use a derivative of the events table, our visit table, most for reporting and analytics. This is currently incrementally updated nightly with a straightforward ETL. It includes session-level event counts and first-event referral information. We also do marketing campaign and email related pre-processing on the event table.
  4. Looker is our core data exploration and reporting tool. I have a couple screenshots of dashboards and exploration in looker on the following slides. We use looker for: Automated email reporting Saved dashboards and individual reports Data exploration – even for non-technical business owners Small to medium sized pre-processing jobs, such as de-duping and revenue attribution. This is done through persistent derived tables, which are query-defined temporary tables built and managed by Looker. These allow pre-processed tables to be modified on the fly and are managed by Looker, reducing pre-processing infrastructure development and maintenance cost.