Snowplow Analytics and Looker at Oyster.com

•Download as PPTX, PDF•

1 like•1,864 views

Presentation by Ben Hoyt and Devon Pohl on their journey with Snowplow at Oyster.com, presented at the first Snowplow Meetup New York in March 2016

Data & Analytics

SNOWPLOW AND LOOKER
AT OYSTER.COM
SNOWPLOW MEETUP NYC – MARCH 30, 2016
BEN HOYT, DEVON POHL

WHAT IS OYSTER.COM?
• “The Hotel Tell-All”
• Authentic hotel reviews and
photos
• We visit every hotel in person
• 1000 hotels per month
• 7M high-res photos
• 100k 360° panoramas

(SOME OF) OUR TECH STACK
• Python to run our backend: web, scripting, photo processing, ETL
• PostgreSQL for all content data (eg: hotels, metadata for 12M images)
• Amazon S3 for image storage, EC2 spot instances for photo processing
• Amazon Redshift for analytics and reporting data
• Looker for reporting and visualizations
• for analytics tracking and analytics ETL

GOOGLE ANALYTICS V. SNOWPLOW
Google Analytics
• Good for web, but little control and flexibility
• Hard to get data out of (your data!)
• Crazy pricing model ($0 for free tier, or $150,000/y for premium)
• Can only do web analytics, not other business reporting
Snowplow
• Free and open source, with great support and paid tiers
• Puts data into a standard, easily-queryable database (Redshift)
• Focuses on tracking and analytics ETL and does that part well

WHY & HOW WE SWITCHED (1 YEAR AGO)
• We were considering Looker for reporting and visualization
• Looker rep: “majority of our customers use Snowplow to collect their data”
• We dug into Snowplow and liked what we saw
• Initially the design felt a bit overkill, but it’s definitely built to scale
• We implemented the tracking and pipeline, and haven’t looked back

$OUR CONTEXT SCHEMA • We use one “custom fields” schema to rule them all • Simple, one table, one SQL join gives us all our custom fields { "self": {"name": "custom_fields", "vendor": "com.oyster", "version": "1-0-9"}, "properties": { "page_type": {"type": "string"}, "page_subtype": {"type": "string"}, "template_type": {"type": "string", "enum": ["desktop", "mobile"]}, "hotel_id": {"$ref": "#/definitions/positiveInteger32"}, "account_id": {"$ref": "#/definitions/positiveInteger32"}, "ab_cell": {"type": "integer", "minimum": 1, "maximum": 20}, "checkin_date": {"type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"}, ...$

OUR DATASET
• A large, though not a massive, dataset
• Redshift cluster: 6 dc1.large SSD nodes, ~1TB storage
• 640 million rows in our events table
• We add 1.5 million event rows per day
• We copy (a subset of) our PostgreSQL content database into
Redshift nightly
• Enables business reporting and advanced content-based queries

REPORTING
• Snowplow and content data are merged to provide insights into:
• Product
• A/B testing
• Funnel mapping
• Marketing
• SEO monitoring
• Ad Campaigns
• Operations
• Workflow Optimization
• ROI Modeling
• Business Trends
• Traffic
• Revenue

VISIT TABLE
• Event data is large and granular – often hard to digest
• Most valuable pre-processing we do is building the visit table
• Incremental build Python ETL run on Redshift
• This is key to most of our reporting infrastructure
• Combines events, custom fields data
• This visit table:
• Is user and user-session-ID granular
• Includes counts of a variety of event types
• Includes all information associated with first event of a visit
• A/B testing cells
• Referral information
• Etc.

LOOKER
• Looker is our core data exploration and reporting tool
• Web-based YAML + visualization wrapper on Redshift
• Enables non-technical business owners self-serve reporting and explore
• Used for other pre-processing via persistent derived tables (PDTs)
• PDTs are temporary tables built and managed by Looker defined by a query
• Good for small-to-medium size pre-processing
• Applications include de-duping and revenue attribution

What's hot

Snowplow - Evolve your analytics stack with your businessGiuseppe Gaviani

Clickstream & Social Media Analysis using Apache SparkTUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science

Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...Big Data Spain

Snowplow: open source game analytics powered by AWSGiuseppe Gaviani

Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.

Snowplow Analytics: from NoSQL to SQL and back againAlexander Dean

Augmenting Mongo DB with treasure dataTreasure Data, Inc.

Treasure Data From MySQL to RedshiftTreasure Data, Inc.

The Stream is the Database - Revolutionizing Healthcare Data ArchitectureDataWorks Summit/Hadoop Summit

Clickstream Analysis With Apache SparkAndreas Zitzelsberger

WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0WSO2

MongoDB .local Munich 2019: MongoDB Atlas Auto-ScalingMongoDB

Snowplow, Metail and CascalogRobert Boland

Google BigQuery for Everyday DeveloperMárton Kodok

Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, HadoopSenthil Pandurangan

Big Data Analytics & ArchitectureAnjani Phuyal

Simply Business and Snowplow - Multichannel Attribution AnalysisStewart Duncan

Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa

Design for Scale - Building Real Time, High Performing Marketing Technology p...Amazon Web Services

WSO2 Analytics Platform - The one stop shop for all your data needsSriskandarajah Suhothayan

What's hot (20)

Snowplow - Evolve your analytics stack with your business

Clickstream & Social Media Analysis using Apache Spark

Real-time user profiling based on Spark streaming and HBase by Arkadiusz Jach...

Snowplow: open source game analytics powered by AWS

Scaling to Infinity - Open Source meets Big Data

Snowplow Analytics: from NoSQL to SQL and back again

Augmenting Mongo DB with treasure data

Treasure Data From MySQL to Redshift

The Stream is the Database - Revolutionizing Healthcare Data Architecture

Clickstream Analysis With Apache Spark

WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0

MongoDB .local Munich 2019: MongoDB Atlas Auto-Scaling

Snowplow, Metail and Cascalog

Google BigQuery for Everyday Developer

Monitoring @ scale over diverse data sources @ PayPal - Druid, TSDB, Hadoop

Big Data Analytics & Architecture

Simply Business and Snowplow - Multichannel Attribution Analysis

Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...

Design for Scale - Building Real Time, High Performing Marketing Technology p...

WSO2 Analytics Platform - The one stop shop for all your data needs

Viewers also liked

Snowplow at Sigfigyalisassoon

Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015yalisassoon

Snowplow is at the core of everything we doyalisassoon

Modelling event data in look mlyalisassoon

Using Snowplow for A/B testing and user journey analysis at CustomMadeyalisassoon

Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016yalisassoon

The analytics journey at Viewbix - how they came to use Snowplow and the setu...yalisassoon

Implementing improved and consistent arbitrary event tracking company-wide us...yalisassoon

2016 09 measurecamp - event data modelingyalisassoon

How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon

Meet Looker 4Looker

Yali presentation for snowplow amsterdam meetup number 2yalisassoon

Snowplow: putting digital analysts at the heart of digital analytics - the fo...yalisassoon

Capturing online customer data to create better insights and targeted actions...yalisassoon

Modeling event datayalisassoon

Tarjetas de presentacionANDRES FERIA C

Towards the development of smart agriculture infrastructure in Wielkopolska r...FOODIE_Project

Viewers also liked (17)

Snowplow at Sigfig

Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015

Snowplow is at the core of everything we do

Modelling event data in look ml

Using Snowplow for A/B testing and user journey analysis at CustomMade

Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016

The analytics journey at Viewbix - how they came to use Snowplow and the setu...

Implementing improved and consistent arbitrary event tracking company-wide us...

2016 09 measurecamp - event data modeling

How we use Hive at SnowPlow, and how the role of HIve is changing

Meet Looker 4

Yali presentation for snowplow amsterdam meetup number 2

Snowplow: putting digital analysts at the heart of digital analytics - the fo...

Capturing online customer data to create better insights and targeted actions...

Modeling event data

Tarjetas de presentacion

Towards the development of smart agriculture infrastructure in Wielkopolska r...

Similar to Snowplow Analytics and Looker at Oyster.com

PyData Berlin MeetupSteffen Wenz

Introduction to MongoDBantoinegirbal

2011 Mongo FR - MongoDB introductionantoinegirbal

Introducing Azure DocumentDB - NoSQL, No ProblemAndrew Liu

Semi Formal Model for Document Oriented DatabasesDaniel Coupal

tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhenDavid Peyruc

Building Highly Flexible, High Performance Query EnginesMapR Technologies

Introduction to Apache Drill - interactive query and analysis at scaleMapR Technologies

Modeling JSON data for NoSQL document databasesRyan CrawCour

Big Data Expo 2015 - MapR Impacting Business As It HappensBigDataExpo

Aggregation Framework MongoDB Days MunichNorberto Leite

Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAdrian Hornsby

Inferring Versioned Schemas from NoSQL Databases and its ApplicationsDiego Sevilla Ruiz

Real Time Data Analytics with MongoDB and Fluentd at WishMongoDB

NoSE: Schema Design for NoSQL ApplicationsMichael Mior

MongoDB Evenings DC: MongoDB - The New Default Database for Giant IdeasMongoDB

A Century Of Weather Data - Midwest.ioRandall Hunt

Making Sense of Schema on ReadKent Graziano

0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...Amazon Web Services

MongoDB at ZPUGDCMike Dirolf

Similar to Snowplow Analytics and Looker at Oyster.com (20)

PyData Berlin Meetup

Introduction to MongoDB

2011 Mongo FR - MongoDB introduction

Introducing Azure DocumentDB - NoSQL, No Problem

Semi Formal Model for Document Oriented Databases

tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When

Building Highly Flexible, High Performance Query Engines

Introduction to Apache Drill - interactive query and analysis at scale

Modeling JSON data for NoSQL document databases

Big Data Expo 2015 - MapR Impacting Business As It Happens

Aggregation Framework MongoDB Days Munich

Serverless Streaming Data Processing using Amazon Kinesis Analytics

Inferring Versioned Schemas from NoSQL Databases and its Applications

Real Time Data Analytics with MongoDB and Fluentd at Wish

NoSE: Schema Design for NoSQL Applications

MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas

A Century Of Weather Data - Midwest.io

Making Sense of Schema on Read

0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...

MongoDB at ZPUGDC

Recently uploaded

Discover Why Less is More in B2B Researchmichael115558

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823

Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Anomaly detection and data imputation within time seriesParis Women in Machine Learning and Data Science

Recently uploaded (20)

Discover Why Less is More in B2B Research

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

CebaBaby dropshipping via API with DroFX.pptx

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...

Predicting Loan Approval: A Data Science Project

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Anomaly detection and data imputation within time series

Snowplow Analytics and Looker at Oyster.com

1. SNOWPLOW AND LOOKER AT OYSTER.COM SNOWPLOW MEETUP NYC – MARCH 30, 2016 BEN HOYT, DEVON POHL

2. WHAT IS OYSTER.COM? • “The Hotel Tell-All” • Authentic hotel reviews and photos • We visit every hotel in person • 1000 hotels per month • 7M high-res photos • 100k 360° panoramas

3. (SOME OF) OUR TECH STACK • Python to run our backend: web, scripting, photo processing, ETL • PostgreSQL for all content data (eg: hotels, metadata for 12M images) • Amazon S3 for image storage, EC2 spot instances for photo processing • Amazon Redshift for analytics and reporting data • Looker for reporting and visualizations • for analytics tracking and analytics ETL

4. GOOGLE ANALYTICS V. SNOWPLOW Google Analytics • Good for web, but little control and flexibility • Hard to get data out of (your data!) • Crazy pricing model ($0 for free tier, or $150,000/y for premium) • Can only do web analytics, not other business reporting Snowplow • Free and open source, with great support and paid tiers • Puts data into a standard, easily-queryable database (Redshift) • Focuses on tracking and analytics ETL and does that part well

5. WHY & HOW WE SWITCHED (1 YEAR AGO) • We were considering Looker for reporting and visualization • Looker rep: “majority of our customers use Snowplow to collect their data” • We dug into Snowplow and liked what we saw • Initially the design felt a bit overkill, but it’s definitely built to scale • We implemented the tracking and pipeline, and haven’t looked back

6. OUR CONTEXT SCHEMA • We use one “custom fields” schema to rule them all • Simple, one table, one SQL join gives us all our custom fields { "self": {"name": "custom_fields", "vendor": "com.oyster", "version": "1-0-9"}, "properties": { "page_type": {"type": "string"}, "page_subtype": {"type": "string"}, "template_type": {"type": "string", "enum": ["desktop", "mobile"]}, "hotel_id": {"$ref": "#/definitions/positiveInteger32"}, "account_id": {"$ref": "#/definitions/positiveInteger32"}, "ab_cell": {"type": "integer", "minimum": 1, "maximum": 20}, "checkin_date": {"type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"}, ...

7. OUR DATASET • A large, though not a massive, dataset • Redshift cluster: 6 dc1.large SSD nodes, ~1TB storage • 640 million rows in our events table • We add 1.5 million event rows per day • We copy (a subset of) our PostgreSQL content database into Redshift nightly • Enables business reporting and advanced content-based queries

8. PAGE TRACKING EXAMPLE

9. ANALYTICS AND LOOKER (DEVON POHL)

10. REPORTING • Snowplow and content data are merged to provide insights into: • Product • A/B testing • Funnel mapping • Marketing • SEO monitoring • Ad Campaigns • Operations • Workflow Optimization • ROI Modeling • Business Trends • Traffic • Revenue

11. VISIT TABLE • Event data is large and granular – often hard to digest • Most valuable pre-processing we do is building the visit table • Incremental build Python ETL run on Redshift • This is key to most of our reporting infrastructure • Combines events, custom fields data • This visit table: • Is user and user-session-ID granular • Includes counts of a variety of event types • Includes all information associated with first event of a visit • A/B testing cells • Referral information • Etc.

12. LOOKER • Looker is our core data exploration and reporting tool • Web-based YAML + visualization wrapper on Redshift • Enables non-technical business owners self-serve reporting and explore • Used for other pre-processing via persistent derived tables (PDTs) • PDTs are temporary tables built and managed by Looker defined by a query • Good for small-to-medium size pre-processing • Applications include de-duping and revenue attribution

13. DASHBOARDS / SAVED REPORTS

14. EXPLORATION

15. OYSTER.COM The Hotel Tell-All

Editor's Notes

Hi – I’m Devon. I work with Oyster.com and Jetsetter sites at Tripadvisor. Just after I joined, Oyster implemented Snowplow and I’ve spent much of the last year using the platform to build reporting and analytics.
We use snowplow data for a wide variety things. With snowplow we learn about: Business health – we have dashboards and daily updates on traffic, revenue and other metrics. We use Looker for most of this, which I’ll talk about more in a moment Product – We’ve learned about our site and users through A/B testing and site traffic mapping Marketing – Snowplow allows us to monitor and analyze marketing efforts, including SEO Operations – We monitor how existing assets are performing and model how prospective assets would likely perform to prioritize work for ops and editorial teams A clear and intuitive context schema is key to marrying snowplow to other data sources – thanks Ben!
The event-level tables can be difficult to use in it’s raw form. We tend to use a derivative of the events table, our visit table, most for reporting and analytics. This is currently incrementally updated nightly with a straightforward ETL. It includes session-level event counts and first-event referral information. We also do marketing campaign and email related pre-processing on the event table.
Looker is our core data exploration and reporting tool. I have a couple screenshots of dashboards and exploration in looker on the following slides. We use looker for: Automated email reporting Saved dashboards and individual reports Data exploration – even for non-technical business owners Small to medium sized pre-processing jobs, such as de-duping and revenue attribution. This is done through persistent derived tables, which are query-defined temporary tables built and managed by Looker. These allow pre-processed tables to be modified on the fly and are managed by Looker, reducing pre-processing infrastructure development and maintenance cost.

Snowplow Analytics and Looker at Oyster.com

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Snowplow Analytics and Looker at Oyster.com

Similar to Snowplow Analytics and Looker at Oyster.com (20)

Recently uploaded

Recently uploaded (20)

Snowplow Analytics and Looker at Oyster.com

Editor's Notes