SlideShare a Scribd company logo
1 of 48
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Greg Khairallah, Business Development Manager, AWS
Adam Savitzky, Software Development Engineer, Yahoo!
Scott Hoover, Data Scientist, Looker
July 23, 2015
Best Practices: Amazon Redshift
Reporting and Advanced Analytics
Amazon Redshift – Resources
Getting Started – June Webinar Series:
https://www.youtube.com/watch?v=biqBjWqJi-Q
Best Practices – July Webinar Series:
Optimizing Performance – July 21, 2015
Migration and Data Loading – July 22,2015
Reporting and Advanced Analytics – July 23, 2015
Agenda
• Connecting to Amazon Redshift
• Case Study – Redshift analytics at Yahoo
• Case Study - Redshift Optimizations at Looker
• Questions and Answers
Petabyte scale; massively parallel
Relational data warehouse
Fully managed; zero admin
SSD & HDD platforms
As low as $1,000/TB/Year
Amazon
Redshift
Common Customer Use Cases
Reduce costs by extending
DW rather than adding HW
Migrate completely from
existing DW systems
Respond faster to business
Improve performance by an
order of magnitude
Make more data available
for analysis
Access business data via
standard reporting tools
Add analytic functionality to
applications
Scale DW capacity as
demand grows
Reduce HW & SW costs by an
order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
Custom ODBC and JDBC Drivers
Up to 35% higher performance than open source drivers
Supported by most Business Intelligence tools
Will continue to support PostgreSQL open source drivers
Download drivers from console
Amazon Redshift Partners
Redshift for Analytics at Yahoo
Adam Savitzky
Tech Yahoo, Software Development Engineer
Introduction
Who am I?
• Yahoo growth team
• Supporting analytics for 6 products in Yahoo’s mobile
portfolio
In the past:
Introduction
What do we do?
▪ Real-time ad-hoc analytics
▪ Mobile properties
▪ What do we care about?
› Engagement and Activity
› User demographics
› Experimentation
› Funnel analysis
› Modeling revenue and user Lifetime Value
› Cohort analysis and retention
High Level Architecture
Mobile App
Hadoop
S3 Redshift
ETL
Scale
▪ On an average day
› 1 billion events
› 25 million devices
› 2 billion parameter key/value pairs
▪ Planned Capacity
› 21 dc1.8xlarge nodes
› 80 billion events
› 100 million devices
› 50 TB (compressed!)
Data Model
Performance Optimizations
▪ Heavy use of summarization where appropriate
▪ Sort keys and partitioning
▪ Data encoding
Event Schema
event_raw
mail
event
hourly
event
daily
install
install
attribution
event_raw
flickr
event_raw
homerun
event_raw
stark
event_raw
arrow
e
v
e
n
t
r
a
w
u
n
i
o
n
v
i
e
w
user
retention
funnel
first_event
date
param
mail
param
flickr
param
homerun
param
stark
param
arrow
p
a
r
a
m
u
n
i
o
n
v
i
e
w
is_active
param
keys
telemetry
daily
revenue
daily
Raw Tables Summary Tables
Derived Tables
Case Study
User Retention Analysis
Definitions
▪ Cohort - A group of product users that share one or more attributes
› Example: All users who installed on Monday with Android devices
▪ Retention - How many members of a cohort of continue to use the
product over time
› Example: 100 users installed on Monday with Android devices. 7 days
later, 50 of those users returned to the product. We would say the 7-
day retention for this cohort is 50%.
Why Study User Retention?
▪ Quantifies how “sticky” your product is
▪ Allows us to measure Customer Lifetime Value (CLV or
LTV)
Why Study User Retention?
Asymptotic
Retention
No Retention
%
Retained
Why Study User Retention?
Total
Users
Time
Asymptotic
Retention
No Retention
Calculating User Retention
Definition: For each possible combination of cohort dimensions, for every possible event date, how
many devices belong to that cohort, and how many devices from that cohort were active on that day
event_date product install_date os_name active_users cohort_size
monday mail monday android 100 100
tuesday mail monday android 83 100
monday mail monday ios 75 75
tuesday mail monday ios 62 75
Example with one dimension, os_name:
Calculating User Retention
Example with one dimension, os_name: What’s my 1 day retention for users who installed on
Monday?
event_date product install_date os_name active_users cohort_size
monday mail monday android 100 100
tuesday mail monday android 83 100
monday mail monday ios 75 75
tuesday mail monday ios 62 75
Calculating User Retention
Example with one dimension, os_name: What’s my 1 day retention for users who installed on
Monday?
event_date product install_date os_name active_users cohort_size
monday mail monday android 100 100
tuesday mail monday android 83 100
monday mail monday ios 75 75
tuesday mail monday ios 62 75
Example with one dimension, os_name:
Calculating User Retention
Example with one dimension, os_name: What’s my 1 day retention for users who installed on
Monday?
event_date product install_date os_name active_users cohort_size
tuesday mail monday android 83 100
tuesday mail monday ios 62 75
145 175
Example with one dimension, os_name:
Aggregate retention across both ios and android is (83 + 62) / (100
+ 75) = 83%
Calculating User Retention
Steps:
1. For each day, determine whether each device was active or not
device_id date is_active
1 2015-01-01 1
1 2015-01-02 0
2 2015-01-01 1
2 2015-01-01 1
Calculating User Retention
Steps:
1. For each day, determine whether each device was active or not
2. Join device attributes to results of Step 1
device_id date is_active os install_date
1 2015-01-01 1 ios 2015-01-01
1 2015-01-02 0 ios 2015-01-01
2 2015-01-01 1 ios 2015-01-01
2 2015-01-01 1 ios 2015-01-01
Calculating User Retention
Steps:
1. For each day, determine whether each device was active or not
2. Join device attributes to results of Step 1
3. SUM is_active column, grouping by date, os, and install_date (and any
other cohort dimensions)
date active_user_count os install_date
2015-01-01 2 ios 2015-01-01
2015-01-02 1 ios 2015-01-01
Calculating User Retention
Steps:
1. For each day, determine whether each device was active or not
2. Join device attributes to results of Step 1
3. SUM is_active column, grouping by date, os, and install_date (and any
other cohort dimensions)
4. Join the size of each cohort to the result of Step 3
date active_user_count os install_date cohort_size
2015-01-01 2 ios 2015-01-01 2
2015-01-02 1 ios 2015-01-01 2
Demo using Looker
Lessons Learned
▪ Summarize data for optimal query performance (hourly
or daily rollups)
▪ Think carefully about data model ahead of time. Choose
the right sort keys.
▪ Invest in a good tool for ETL (we use Airflow)
▪ Invest in a good tool for query building and sharing (we
use Looker)
▪ Reserve plenty of spare capacity (at least 40% free)
▪ Reserved nodes are much cheaper
▪ DC nodes are faster, but much smaller capacity
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Scott Hoover, Data Scientist
Redshift and Looker
• We use Redshift to power our own implementation of
Looker, which serves every department with business
intelligence and data for analytics.
• I have worked at Looker for just over two years, doing
everything from Sales Engineering to Professional
Services to Data Engineering. I currently head up our
internal analytics efforts.
Introduction
• How Looker uses Redshift to supply business intelligence
and drive analytics internally.
• How a few Looker customers use Redshift for reporting
and analytics.
Agenda
At Looker, we have two major use cases which drove
our decision to go with Redshift:
• fast analysis of usage data (300+ million events);
• to centralize multiple data sources into a single
warehouse.
Looker and Redshift
• Customer Health:
- MoM/WoW percent change in usage
- Users added/removed
- User engagement (developer, explorer, consumer, occasional consumer)
- LookML contributions and contributors
• Product Usage:
- Features used/not used
- Release pain points
- Github issue/feature tracking
• Reporting for Sales and Marketing:
- Usage in trial
- Performance to quota (sales, meetings, leads, etc.)
- Lead/prospect fit
- Campaign attribution
- SaaS metrics: MRR, cMRR, Churn
What We Care About Most
Redshift Data Pipeline
Pinger
License
Real-Time RDS
Data Model
Event Data & Everything Else
Event Schema
{
"event_id": "1",
"event_type" : "view_connection",
"created_at" : "2015-07-08 20:04:08 +0000",
"attrs" : { "country" : "US",
"state" : "CA",
"browser" : "Safari/537.36",
"uri" : "%2Fadmin%2Fconnections"
}
},
{
"event_id": "2",
"event_type" : "save_look",
"created_at" : "2015-07-08 20:04:12 +0000",
"attrs" : { "country" : "US",
"state" : "CA",
"browser" : "Safari/537.36",
"look_id" : "32"
}
}
Event Schema
id type created_at country state uri browser error … k
1
view_
connection
2015-07-08
20:04:08 +0000
US CA
%2Fadmin%2
Fconnections
Safari/537.36 ø … k1
2 save_look
2015-07-08
20:04:12 +0000
US CA ø Safari/537.36 ø … k2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
N run_query
2015-07-08
22:01:16 +0000
UK ø %2Ffields= Chrome ø … kN
- explore: events
extends: license_base
label: 'Pinger'
always_filter:
events.created_date: '30 days'
joins:
- join: license
sql_on: ${events.license_slug} = ${license.new_slug}
relationship: many_to_one
- join: license_users
sql_on: ${events.user_id} = ${license_users.id}
relationship: many_to_many
- join: client
sql_on: ${client.id} = ${events.client_id}
relationship: many_to_one
- join: account
sql_on: ${client.salesforce_account_id} = ${account.id}
relationship: many_to_one
- join: opportunity
sql_on: ${account.id} = ${opportunity.account_id}
relationship: many_to_one
[...]
- join: sessions
sql_on: ${sessions.event_id} = ${events.id}
relationship: many_to_one
Event Schema
Everything Else
company_id account_id opportunity_id trial_id license_id lead_id campaign_id
campaign_member_
at
… k
1 E000000zD0IFIA0
E000000Oi9mxIA
B
0000014uTRG
MA2
1423
00QE000000N
qLsvMAF
701E0000000
6MC7IAM
2013-09-23 23:03:05
+0000
… k1
1 E000000zD0IFIA0
E000000Oi9mxIA
B
0000014uTRG
MA2
1423
00QE000000e
0ZsYMAU
701E0000000
6OAaIAM
2014-02-20 22:39:25
+0000
… k2
1 E000000zD0IFIA0
E000000Oi9mxIA
B
0000014uTRG
MA2
1423
00QE000000e
0ZsYMAU
701E0000000
8XEbIAM
2015-02-18 00:06:09
+0000
… k3
2 E000000zrbTgIAI
E000000VuLHhI
AN
a06E000000a
NOcVIAW
1601
00QE000000X
JVJiMAP
701E0000000
6OB9IAM
2015-04-01 22:04:05
+0000
… k4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
N … kN
- explore: company
joins:
- join: account
sql_on: ${company.account_id} = ${account.id}
relationship: many_to_one
- join: opportunity
sql_on: ${company.opportunity_id} = ${opportunity.id}
relationship: many_to_one
- join: lead
sql_on: ${company.lead_id} = ${lead.id}
relationship: many_to_one
- join: contact
sql_on: ${contact.id} = ${company.contact_id}
relationship: many_to_one
fields: [export_set*]
- join: campaign
sql_on: ${company.campaign_id} = ${campaign.id}
relationship: many_to_one
- join: trial
sql_on: ${company.trial_id} = ${trial.id}
relationship: many_to_one
- join: account_representative
from: user
sql_on: ${opportunity.owner_id} = ${account_representative.id}
fields: [name, count]
relationship: many_to_one
- join: license
sql_on: ${company.account_id} = ${license.salesforce_account_id}
relationship: one_to_one
Everything Else
Explore and Visualize
Analyze - Lead Scoring
API 3.0
API
• Construct historical
data set or “Look.”
• GET “Look" using
Looker API.
• Train/test model in R.
• Output PMML file.
• EC2 hosts
Openscoring REST
service + PMML.
• Hit Salesforce API for
new leads; score
leads; update each
lead record.
• View prioritized lists
in Looker.
GET lead
UPDATE lead
GET look
• Scale/Performance
- Transactional databases are not ideal for analytics (slow).
- Redshift scales quickly and is incredibly fast.
• Accessibility
- SQL is in many analysts’ wheelhouse and is easy to adopt.
- Obvious choice for those in the AWS ecosystem or who
preferred managed offerings.
• Centralization of data
- When it comes time to tie top-of-funnel actions to bottom-of-
funnel behavior.
Why Our Customers Use Redshift
• Backstage/Sonicbids: They built an artist search tool that
uses social data from Facebook, Twitter, YouTube, and
Soundcloud to inform booking agents on what sort of draw
they could expect from a certain artist. They used Snowplow,
Redshift, the Looker API , Elasticsearch to build this system.
How Our Customers Use Redshift
• Smartling: sources website translation snippets from
translators the world over. They maintain a database of
translated snippets, like “the car is red” in Turkish, in order
validate incoming translations. So, when a request for “the
car is blue” in Turkish comes in, they can make an
assessment on the syntactic validity of the translation.
How Our Customers Use Redshift
Learn more at www.looker.com

More Related Content

What's hot

What's hot (20)

Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
 
AWS_Data_Pipeline
AWS_Data_PipelineAWS_Data_Pipeline
AWS_Data_Pipeline
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 

Viewers also liked

AWS Summit Benelux 2013 - Media and Online Advertising on AWS
AWS Summit Benelux 2013 - Media and Online Advertising on AWSAWS Summit Benelux 2013 - Media and Online Advertising on AWS
AWS Summit Benelux 2013 - Media and Online Advertising on AWS
Amazon Web Services
 
Marco T. Giordano, Identità digitale e reputazione online – pt. 3
Marco T. Giordano, Identità digitale e reputazione online – pt. 3Marco T. Giordano, Identità digitale e reputazione online – pt. 3
Marco T. Giordano, Identità digitale e reputazione online – pt. 3
Andrea Rossetti
 
Stefano Ricci, Privacy & Cloud Computing
Stefano Ricci, Privacy & Cloud ComputingStefano Ricci, Privacy & Cloud Computing
Stefano Ricci, Privacy & Cloud Computing
Andrea Rossetti
 

Viewers also liked (20)

Running Active Directory in the AWS Cloud
Running Active Directory in the AWS Cloud Running Active Directory in the AWS Cloud
Running Active Directory in the AWS Cloud
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
AWS Summit Benelux 2013 - Media and Online Advertising on AWS
AWS Summit Benelux 2013 - Media and Online Advertising on AWSAWS Summit Benelux 2013 - Media and Online Advertising on AWS
AWS Summit Benelux 2013 - Media and Online Advertising on AWS
 
Scheduling Containers on Amazon ECS
Scheduling Containers on Amazon ECSScheduling Containers on Amazon ECS
Scheduling Containers on Amazon ECS
 
AWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSight
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
MPP vs Hadoop
MPP vs HadoopMPP vs Hadoop
MPP vs Hadoop
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Special Promotions Pro: Magento Extension by Amasty. User Guide.
Special Promotions Pro: Magento Extension by Amasty. User Guide.Special Promotions Pro: Magento Extension by Amasty. User Guide.
Special Promotions Pro: Magento Extension by Amasty. User Guide.
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
AWS Webcast - Active Directory on AWS
AWS Webcast - Active Directory on AWSAWS Webcast - Active Directory on AWS
AWS Webcast - Active Directory on AWS
 
Awsome day outro cph 201509
Awsome day outro cph 201509Awsome day outro cph 201509
Awsome day outro cph 201509
 
Marco T. Giordano, Identità digitale e reputazione online – pt. 3
Marco T. Giordano, Identità digitale e reputazione online – pt. 3Marco T. Giordano, Identità digitale e reputazione online – pt. 3
Marco T. Giordano, Identità digitale e reputazione online – pt. 3
 
Palestra GestãO Call Center Aulavox
Palestra GestãO Call Center AulavoxPalestra GestãO Call Center Aulavox
Palestra GestãO Call Center Aulavox
 
Stefano Ricci, Privacy & Cloud Computing
Stefano Ricci, Privacy & Cloud ComputingStefano Ricci, Privacy & Cloud Computing
Stefano Ricci, Privacy & Cloud Computing
 
Putting GRAPHics into geoGRAPHy presentations
Putting GRAPHics into geoGRAPHy presentationsPutting GRAPHics into geoGRAPHy presentations
Putting GRAPHics into geoGRAPHy presentations
 
Planet Panorama
Planet PanoramaPlanet Panorama
Planet Panorama
 

Similar to AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics

Thought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveThought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserve
Ron Krzoska
 
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
Amazon Web Services
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
VMware Tanzu
 
Anu_Sharma2016_DWH
Anu_Sharma2016_DWHAnu_Sharma2016_DWH
Anu_Sharma2016_DWH
Anu Sharma
 

Similar to AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics (20)

Thought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveThought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserve
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Power Platform Governance
Power Platform GovernancePower Platform Governance
Power Platform Governance
 
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
Accelerate your SAP BusinessObjects to the Cloud
Accelerate your SAP BusinessObjects to the CloudAccelerate your SAP BusinessObjects to the Cloud
Accelerate your SAP BusinessObjects to the Cloud
 
Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Anu_Sharma2016_DWH
Anu_Sharma2016_DWHAnu_Sharma2016_DWH
Anu_Sharma2016_DWH
 
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azure
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Migrating from Oracle to Postgres
Migrating from Oracle to PostgresMigrating from Oracle to Postgres
Migrating from Oracle to Postgres
 
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your DataMongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
 
Empowering Customers with Personalized Insights
Empowering Customers with Personalized InsightsEmpowering Customers with Personalized Insights
Empowering Customers with Personalized Insights
 
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic SolutionsADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic Solutions
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Greg Khairallah, Business Development Manager, AWS Adam Savitzky, Software Development Engineer, Yahoo! Scott Hoover, Data Scientist, Looker July 23, 2015 Best Practices: Amazon Redshift Reporting and Advanced Analytics
  • 2. Amazon Redshift – Resources Getting Started – June Webinar Series: https://www.youtube.com/watch?v=biqBjWqJi-Q Best Practices – July Webinar Series: Optimizing Performance – July 21, 2015 Migration and Data Loading – July 22,2015 Reporting and Advanced Analytics – July 23, 2015
  • 3. Agenda • Connecting to Amazon Redshift • Case Study – Redshift analytics at Yahoo • Case Study - Redshift Optimizations at Looker • Questions and Answers
  • 4. Petabyte scale; massively parallel Relational data warehouse Fully managed; zero admin SSD & HDD platforms As low as $1,000/TB/Year Amazon Redshift
  • 5. Common Customer Use Cases Reduce costs by extending DW rather than adding HW Migrate completely from existing DW systems Respond faster to business Improve performance by an order of magnitude Make more data available for analysis Access business data via standard reporting tools Add analytic functionality to applications Scale DW capacity as demand grows Reduce HW & SW costs by an order of magnitude Traditional Enterprise DW Companies with Big Data SaaS Companies
  • 6. Custom ODBC and JDBC Drivers Up to 35% higher performance than open source drivers Supported by most Business Intelligence tools Will continue to support PostgreSQL open source drivers Download drivers from console
  • 8. Redshift for Analytics at Yahoo Adam Savitzky Tech Yahoo, Software Development Engineer
  • 9. Introduction Who am I? • Yahoo growth team • Supporting analytics for 6 products in Yahoo’s mobile portfolio In the past:
  • 10. Introduction What do we do? ▪ Real-time ad-hoc analytics ▪ Mobile properties ▪ What do we care about? › Engagement and Activity › User demographics › Experimentation › Funnel analysis › Modeling revenue and user Lifetime Value › Cohort analysis and retention
  • 11. High Level Architecture Mobile App Hadoop S3 Redshift ETL
  • 12. Scale ▪ On an average day › 1 billion events › 25 million devices › 2 billion parameter key/value pairs ▪ Planned Capacity › 21 dc1.8xlarge nodes › 80 billion events › 100 million devices › 50 TB (compressed!)
  • 14. Performance Optimizations ▪ Heavy use of summarization where appropriate ▪ Sort keys and partitioning ▪ Data encoding
  • 17. Definitions ▪ Cohort - A group of product users that share one or more attributes › Example: All users who installed on Monday with Android devices ▪ Retention - How many members of a cohort of continue to use the product over time › Example: 100 users installed on Monday with Android devices. 7 days later, 50 of those users returned to the product. We would say the 7- day retention for this cohort is 50%.
  • 18. Why Study User Retention? ▪ Quantifies how “sticky” your product is ▪ Allows us to measure Customer Lifetime Value (CLV or LTV)
  • 19. Why Study User Retention? Asymptotic Retention No Retention % Retained
  • 20. Why Study User Retention? Total Users Time Asymptotic Retention No Retention
  • 21. Calculating User Retention Definition: For each possible combination of cohort dimensions, for every possible event date, how many devices belong to that cohort, and how many devices from that cohort were active on that day event_date product install_date os_name active_users cohort_size monday mail monday android 100 100 tuesday mail monday android 83 100 monday mail monday ios 75 75 tuesday mail monday ios 62 75 Example with one dimension, os_name:
  • 22. Calculating User Retention Example with one dimension, os_name: What’s my 1 day retention for users who installed on Monday? event_date product install_date os_name active_users cohort_size monday mail monday android 100 100 tuesday mail monday android 83 100 monday mail monday ios 75 75 tuesday mail monday ios 62 75
  • 23. Calculating User Retention Example with one dimension, os_name: What’s my 1 day retention for users who installed on Monday? event_date product install_date os_name active_users cohort_size monday mail monday android 100 100 tuesday mail monday android 83 100 monday mail monday ios 75 75 tuesday mail monday ios 62 75 Example with one dimension, os_name:
  • 24. Calculating User Retention Example with one dimension, os_name: What’s my 1 day retention for users who installed on Monday? event_date product install_date os_name active_users cohort_size tuesday mail monday android 83 100 tuesday mail monday ios 62 75 145 175 Example with one dimension, os_name: Aggregate retention across both ios and android is (83 + 62) / (100 + 75) = 83%
  • 25. Calculating User Retention Steps: 1. For each day, determine whether each device was active or not device_id date is_active 1 2015-01-01 1 1 2015-01-02 0 2 2015-01-01 1 2 2015-01-01 1
  • 26. Calculating User Retention Steps: 1. For each day, determine whether each device was active or not 2. Join device attributes to results of Step 1 device_id date is_active os install_date 1 2015-01-01 1 ios 2015-01-01 1 2015-01-02 0 ios 2015-01-01 2 2015-01-01 1 ios 2015-01-01 2 2015-01-01 1 ios 2015-01-01
  • 27. Calculating User Retention Steps: 1. For each day, determine whether each device was active or not 2. Join device attributes to results of Step 1 3. SUM is_active column, grouping by date, os, and install_date (and any other cohort dimensions) date active_user_count os install_date 2015-01-01 2 ios 2015-01-01 2015-01-02 1 ios 2015-01-01
  • 28. Calculating User Retention Steps: 1. For each day, determine whether each device was active or not 2. Join device attributes to results of Step 1 3. SUM is_active column, grouping by date, os, and install_date (and any other cohort dimensions) 4. Join the size of each cohort to the result of Step 3 date active_user_count os install_date cohort_size 2015-01-01 2 ios 2015-01-01 2 2015-01-02 1 ios 2015-01-01 2
  • 30. Lessons Learned ▪ Summarize data for optimal query performance (hourly or daily rollups) ▪ Think carefully about data model ahead of time. Choose the right sort keys. ▪ Invest in a good tool for ETL (we use Airflow) ▪ Invest in a good tool for query building and sharing (we use Looker) ▪ Reserve plenty of spare capacity (at least 40% free) ▪ Reserved nodes are much cheaper ▪ DC nodes are faster, but much smaller capacity
  • 31. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scott Hoover, Data Scientist Redshift and Looker
  • 32. • We use Redshift to power our own implementation of Looker, which serves every department with business intelligence and data for analytics. • I have worked at Looker for just over two years, doing everything from Sales Engineering to Professional Services to Data Engineering. I currently head up our internal analytics efforts. Introduction
  • 33. • How Looker uses Redshift to supply business intelligence and drive analytics internally. • How a few Looker customers use Redshift for reporting and analytics. Agenda
  • 34. At Looker, we have two major use cases which drove our decision to go with Redshift: • fast analysis of usage data (300+ million events); • to centralize multiple data sources into a single warehouse. Looker and Redshift
  • 35. • Customer Health: - MoM/WoW percent change in usage - Users added/removed - User engagement (developer, explorer, consumer, occasional consumer) - LookML contributions and contributors • Product Usage: - Features used/not used - Release pain points - Github issue/feature tracking • Reporting for Sales and Marketing: - Usage in trial - Performance to quota (sales, meetings, leads, etc.) - Lead/prospect fit - Campaign attribution - SaaS metrics: MRR, cMRR, Churn What We Care About Most
  • 37. Data Model Event Data & Everything Else
  • 38. Event Schema { "event_id": "1", "event_type" : "view_connection", "created_at" : "2015-07-08 20:04:08 +0000", "attrs" : { "country" : "US", "state" : "CA", "browser" : "Safari/537.36", "uri" : "%2Fadmin%2Fconnections" } }, { "event_id": "2", "event_type" : "save_look", "created_at" : "2015-07-08 20:04:12 +0000", "attrs" : { "country" : "US", "state" : "CA", "browser" : "Safari/537.36", "look_id" : "32" } }
  • 39. Event Schema id type created_at country state uri browser error … k 1 view_ connection 2015-07-08 20:04:08 +0000 US CA %2Fadmin%2 Fconnections Safari/537.36 ø … k1 2 save_look 2015-07-08 20:04:12 +0000 US CA ø Safari/537.36 ø … k2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N run_query 2015-07-08 22:01:16 +0000 UK ø %2Ffields= Chrome ø … kN
  • 40. - explore: events extends: license_base label: 'Pinger' always_filter: events.created_date: '30 days' joins: - join: license sql_on: ${events.license_slug} = ${license.new_slug} relationship: many_to_one - join: license_users sql_on: ${events.user_id} = ${license_users.id} relationship: many_to_many - join: client sql_on: ${client.id} = ${events.client_id} relationship: many_to_one - join: account sql_on: ${client.salesforce_account_id} = ${account.id} relationship: many_to_one - join: opportunity sql_on: ${account.id} = ${opportunity.account_id} relationship: many_to_one [...] - join: sessions sql_on: ${sessions.event_id} = ${events.id} relationship: many_to_one Event Schema
  • 41. Everything Else company_id account_id opportunity_id trial_id license_id lead_id campaign_id campaign_member_ at … k 1 E000000zD0IFIA0 E000000Oi9mxIA B 0000014uTRG MA2 1423 00QE000000N qLsvMAF 701E0000000 6MC7IAM 2013-09-23 23:03:05 +0000 … k1 1 E000000zD0IFIA0 E000000Oi9mxIA B 0000014uTRG MA2 1423 00QE000000e 0ZsYMAU 701E0000000 6OAaIAM 2014-02-20 22:39:25 +0000 … k2 1 E000000zD0IFIA0 E000000Oi9mxIA B 0000014uTRG MA2 1423 00QE000000e 0ZsYMAU 701E0000000 8XEbIAM 2015-02-18 00:06:09 +0000 … k3 2 E000000zrbTgIAI E000000VuLHhI AN a06E000000a NOcVIAW 1601 00QE000000X JVJiMAP 701E0000000 6OB9IAM 2015-04-01 22:04:05 +0000 … k4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N … kN
  • 42. - explore: company joins: - join: account sql_on: ${company.account_id} = ${account.id} relationship: many_to_one - join: opportunity sql_on: ${company.opportunity_id} = ${opportunity.id} relationship: many_to_one - join: lead sql_on: ${company.lead_id} = ${lead.id} relationship: many_to_one - join: contact sql_on: ${contact.id} = ${company.contact_id} relationship: many_to_one fields: [export_set*] - join: campaign sql_on: ${company.campaign_id} = ${campaign.id} relationship: many_to_one - join: trial sql_on: ${company.trial_id} = ${trial.id} relationship: many_to_one - join: account_representative from: user sql_on: ${opportunity.owner_id} = ${account_representative.id} fields: [name, count] relationship: many_to_one - join: license sql_on: ${company.account_id} = ${license.salesforce_account_id} relationship: one_to_one Everything Else
  • 44. Analyze - Lead Scoring API 3.0 API • Construct historical data set or “Look.” • GET “Look" using Looker API. • Train/test model in R. • Output PMML file. • EC2 hosts Openscoring REST service + PMML. • Hit Salesforce API for new leads; score leads; update each lead record. • View prioritized lists in Looker. GET lead UPDATE lead GET look
  • 45. • Scale/Performance - Transactional databases are not ideal for analytics (slow). - Redshift scales quickly and is incredibly fast. • Accessibility - SQL is in many analysts’ wheelhouse and is easy to adopt. - Obvious choice for those in the AWS ecosystem or who preferred managed offerings. • Centralization of data - When it comes time to tie top-of-funnel actions to bottom-of- funnel behavior. Why Our Customers Use Redshift
  • 46. • Backstage/Sonicbids: They built an artist search tool that uses social data from Facebook, Twitter, YouTube, and Soundcloud to inform booking agents on what sort of draw they could expect from a certain artist. They used Snowplow, Redshift, the Looker API , Elasticsearch to build this system. How Our Customers Use Redshift
  • 47. • Smartling: sources website translation snippets from translators the world over. They maintain a database of translated snippets, like “the car is red” in Turkish, in order validate incoming translations. So, when a request for “the car is blue” in Turkish comes in, they can make an assessment on the syntactic validity of the translation. How Our Customers Use Redshift
  • 48. Learn more at www.looker.com