SlideShare a Scribd company logo
1 of 75
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Modern Cloud Data Warehousing
ft. Equinox Fitness Clubs: Optimize
Your Analytics Practices
A N T 2 0 2 - R
Ryan Kelly
Data Architect
Equinox
Elliott Cordo
VP Data Analytics
Equinox
Lisa Perazzoli
Sr. Product Manager
Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Raise your hand if you’re using
Amazon Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Your requirements
are evolving
Data variety
and data
volumes
are increasing
rapidly
Integrate
Disparate
data sets
Democratized
access to data
in a governed way
Analytic needs are
evolving beyond
batch reports to
Real-time and
predictive
Incorporation of
Voice, image
recognition, and
IoT use cases
into applications
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Traditionally, analytics used to look like this
Data Warehouse
LOBCRMERPOLTP
Business Intelligence
Relational data
GBs-TBs scale
Schema defined prior to data load
Operational reporting and ad hoc
Large initial capex+ $10k–$50k/TB/year
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data
every 5 years
There is more data
than people think.
years
live for
Data platforms need to
scalegrows
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
There are more
data types than
ever before.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hadoop Elasticsearch
There are more
ways to analyze data
than ever before
Years ago
11 8 5 4
Presto Spark
Didn’t exist
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What does
data warehouse
modernization
mean? Easy to use Extends to
your Data Lake
Don’t waste time on
menial administrative
tasks and maintenance
Directly analyze data
stored in your data lake
in open formats
Any scale of data,
workloads, and users
Dynamically scale up to
guarantee performance even
with unpredictable demands
and data volumes
Faster
time-to-insights
Consistently fast
performance, even with
thousands of concurrent
queries and users
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
Fastest
Get faster time-to-insight
for all types of analytics
workloads; powered by
machine learning, columnar
storage and MPP
Unlimited
scale
Extends your
Data Lake
1/10th
the cost
Dynamically scale up to
guarantee performance
even with unpredictable
analytical demands and
data volumes
Analyze data in the Amazon
S3 Data Lake in-place and in
open formats, together with
data loaded into Redshift’s
high performance SSDs
Start at $0.25 per hour,
save costs with automated
administration tasks and
eliminate business impact
due to downtime; as low as
$1,000 per terabyte per year
Fast, simple, cost-effective data
warehouse that can extend queries to your Data Lake
Analyze data in open formats
such as Parquet, ORC, and JSON, using SQL tools
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
for their cloud
data warehouse
workloads than
anyone else
Amazon
Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Selected Amazon Redshift Partners
Data Integration Business Intelligence Systems Integrators
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Faster Time to Insights
Normalized Queries Per Hour (QPH)
Assuming Redshift’s QPH 6 months ago=100%
Queriesperhour
Asa%ofredshift6monthsago
Higher is better
100%
181%
237%
284%
350%
>3x faster
Faster performance
New!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift Query Editor
Query data
directly from
the AWS console
Results are instantly
visible within the console
No need to install
and setup an external
JDBC/ODBC client
Launched in October!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift Advisor
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
>96% of
clusters
have tailored
feedback
Provides
automated
recommendations
to help optimize database
performance and
decrease operating costs
Actionable
WLM
COPY, storage,
and system
maintenance advice
for tuning based
on continuous
workload analysis
Intelligent
recommendations
Launched in July!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift intelligent maintenance
VacuumAnalyze WLM
Concurrency
Setting
AutoAuto Auto
Maintenance processes like
vacuum and analyze will
automatically run in the
background.
Redshift will automatically adjust the
WLM concurrency setting to deliver
optimal throughput.
Moving towards
zero-maintenance.
Coming Soon!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift Elastic Resize (GA)
Adds
additional
nodes
to Redshift cluster
Run queries
faster
in busy periods
Minimal
transition time
Scale compute
and storage on-
demand
Scale up and down in minutes
New!
Redshift
Cluster
Computenodes
Redshift Managed S3
JDBC/ODBC
Leader Node
CN2CN1 CN3 CN4
Backup
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Caching Layer
Concurrency Scaling for
bursts of user activity (Preview)
Automatically
creates more
clusters on-
demand
Consistently
fast
performance
even with
thousands of
concurrent queries
No advance
hydration
required
Free for >97%
of customers
for every 24 hours
that your main
cluster is in use, you
accrue a one-hour
credit for
Concurrency Scaling
New!
Backup
Amazon Redshift Managed S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data warehouse
modernization
is also about the
transition to
data lakes
Data Warehouse Data Lake
OLTP ERP CRM LOB
Business
Intelligence
Devices Web Sensors Social
Machine
Learning
Data Catalog
DW
Queries
Big data
processing
Interactive
analysis
Real-time
insights
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The power of data lakes
Most ways to bring data in
Terabyte – Exabyte scale
Security,
compliance, and audit capabilities
Run any analytics
on the same data without movement
Scale
storage and compute independently
Designed for low-cost
storage and analytics
Redshift
EMR Athena
AI Services
ElasticsearchKinesis
Snowball
Kinesis
Video Streams
Kinesis
Data
Streams
Kinesis
Data Firehose
Snowmobile
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Layers of a
Data Lake
INGEST
Security
S3
Analyze & infer
Redshift
EMR
Athena
AI Services
Elasticsearch
Service
Kinesis
Discover
AWS Glue
Snowball
Snowmobile
Kinesis Data Firehose
Kinesis Data Streams
Kinesis Data Streams
Database Migration Service
Ingest
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Query the
same data
with the best
analytics tool
for the job
Data Lake
on AWS
Redshift EMR Athena Kinesis
Sage
Maker
The importance
of open data
formats and
open APIs
Eliminates data
silos and tool lock-ins
Unified access
and governance
Platform decisions are long-lived.
Innovation in analytics is high.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Modernizing your data warehouse includes
Extending data
warehouse queries
into the data lake
Sizing the data
warehouse
independent of
the data lake
Support for open
data formats
Integration with a
variety of analytical
tools in the data lake
Scalability Unified access and
governance
A solution that will last
for the next 10+ years
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Who we are
Is a company with integrated luxury and lifestyle offerings
centered on movement, nutrition, and regeneration
we operate more than 200 locations within every major city
across the country in addition to London and Canada
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How hard
could it be?
People check
into the clubs?
Members lift weights and
put them down?
Building neighbors feel
shaking from heavy weights?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
More than meets the eye
Many lines of business across
98 clubs & 200+ in total
Plus central supporting
functions
Digital
Products
CRM Marketing Creative
Development
/ Building
Finance Member’s
Services
Maintenance
Personal
training
Pilates Spa Group
Fitness
Membership/
Sales
Retail Food
Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Digital Products
End user applications
Connections to Apple Health
Connected
Equipment
Pursuit (gamified cycling experience)
Cardio
Digital Assessment
Location Tracking
Connected Tech
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The history of data
First there was “LIFE”…
This was Equinox’s
first data warehouse
and was created in 2008
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The history of data
Rigorously
Kimball
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
LIFE was good…
Reporting was reliable
Analytics, sometimes self-serviced!
Customer Profiles
CRM
Email Marketing
Personalization
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
But sometimes it was bad…
Direct integration with applications, tight coupling
Difficult SDLC, testing cycle, release management
Functional debt
No place to put NEW data
In-flexibility for Data Science
Expensive commercial software
FML
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Version 1.X
About 4 years ago
we purchased
Launched
several apps
running in beta
Very
expensive
Limitations with
integrations
Required
platform-specific
knowledge
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Re-centering on our goals
Provide
business value
Build technology that
differentiates
Reduce cost and
go all-in on
cloud technology
Adopt modern
engineering principles
Make scalable components
Use ephemeral, stateless
resources
Use distributed databases
Less focus on individual servers
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Doesn’t work for everything!
Just put everything in Hadoop or Amazon S3 data lake
You don’t need a data warehouse
Everything can just be late bind
The “new” school
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data warehouse vs. data lake
Data Lakes Data Warehouse
Reliable high SLA reporting
Developer and analyst friendly
Efficient for specific types of pipelines
Large immutable data sets
Semi-structured and
unstructured data sets
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Project “Cosmo”
Two week
proof-of-concept
Re-platformed one
Teradata app
It worked!
Amazon
Redshift
Amazon
S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bidding farewell
Au revoir
(Not for sale anymore)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“JARVIS” is born
Data Warehouse Data Lakes Data Services
From successful POC to new data platform JARVIS
Amazon
Redshift
Amazon
S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
JARVIS architecture
Data & Analytics Apps
Equinox Apps
Third Party Apps
Informatica
Maximilian
EMR
PT
App
Pursuit
Engage
Exact
Target
Adobe Social
MOSO
Fitness
Agg.
Amazon
Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift benefits
Cost effective
1/10th the cost of Teradata and SQL Server licensing and maintenance
Low barriers for developers & easy to maintain
Much less platform specific knowledge
Fast and performant
data pipelines reduced from hours to minutes
Devops friendly
API, automation, multi-cluster
Integration with other AWS services
and third party tools
Amazon
Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sailing on the data lake
High performance,
low cost, blob storage on S3
Functioning
analytic store (not a dumping ground)
Flexible,
late bind strategies where appropriate
Quick setup
for external tables
Easily implement
DR strategies
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data in
the lake
Clickstream data
PURSUIT cycling logs
Club management software logs
Data from software that
enhances our services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Big picture
Query the data
Immutable app log data - Adobe Analytics
Toolkit
Amazon
Redshift
Amazon
EMR
Amazon
Athena
Metadata
AWS
Glue
Storage Amazon
S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data ingestion
Adobe Analytics data feeds
Functionality built-in
Choose columns to receive
Specify AWS credentials and Amazon S3
information
Get files daily
Not so fast…
Multiple files are then sent to Amazon S3
including multiple data files, multiple lookup
files, and a manifest file describing
everything sent
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Landing in the data lake
Throw all raw
files into an S3
landing bucket
Use Amazon EMR to
aggregate into
single file
2
Save new parquet
file to S3 data
lake bucket
1 3
Save clean data
to a sub-folder named
“dt=YYYY-MM-DD”
Partitioning data
in separate folders allows
for less data to be scanned
Extra
Credit!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Setting AWS Glue up, part 1
Cleaned data is now in Amazon S3
but it can’t be queried yet
Data must be described in AWS Glue
Create a
database in
Glue to label
the data source
Create an
external table
in Glue
interface
1 2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Setting AWS Glue up, part 2
Set up external table in AWS Glue interface:
External tables can also be
created in Athena or Redshift
Run Create External Table
Select Add
Table
manually
Point table
data source
to S3 folder
location
Define
schema
Define “dt”
as an addition
column for
partition
1 2 3 4
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Further describing the lake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The assembled pipeline
Adobe
Analytics
EMR AthenaS3
Glue Data
Catalog
Redshift
Spectrum
S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
But wait… why ALTER TABLE
a = Athena()
partitions = [
{‘dt’ : ‘2017-11-20’ , ‘facility-code’ : ‘716’},
{‘dt’ : ‘2017-11-20’ , ‘facility-code’ : ‘715’},
{‘dt’ : ‘2017-11-20’ , ‘facility-code’ : ‘714’}
]
a.repair_table(db_name=‘cyclingops’,
table=‘cycling_logs’, partitions=partitions)
Partitioned tables must
be told about new data
If it is not made aware then it cannot be queried
Alter table easily with
Glue crawler or Athena
We built an Athena interaction
class in Python for flexibility
On successful EMR job we use this class
to repair the table
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Processing on Redshift
Light
transformation
via ELT scripts
Happen inside of Redshift
Orchestrated by Maximilian
Big crunches and
semi-structured
data processing
Happen outside of Redshift
Help reserve query capacity
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What do our models look like
Flattened
Redshift is columnar so
wide tables are A-OK!
Distributed joins can
be expensive
Rational and
conservative
use of dimensions
especially “Type 2”
Somewhat like
star schemas
Basically, get answer and put in table!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
f_checkin
checkin_id
member_id
accounting_cd
terminal_name
contact_key
facility_key
checkin_type_desc
checkin_status
checkin_issue_reasons
checkin_date_key
checkin_time_key
checkin_ts_time
checkin_raw_count
checkin_unique_count
checkin_good_count
checkin_good_daily_counter
trial_checkin_count
etl_source_system_cd
etl_row_create_dts
etl_row_update_dts
etl_run_id
Sample data model
1. Don’t make dimensions
you don’t have to
2. No “junk” dimensions
3. No “mystery” or
flag dimensions
d_contact
contact_key
contact_id
….
d_facility
contact_key
contact_id
….
1
2
3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fall (DIST)STYLE fashions
DISTSTYLE ALL
Each node receives complete table
Reduces disk usage on small-medium size tables
Preferred for table sizes up to 3M rows with slow changing data
DISTSTYLE KEY
Each node receives portion of data via chosen key
Optimizes JOIN, INSERT INTO, GROUP BY performance
DISTSTYLE EVEN
Each node receives portion of data via round robin
Use if neither option above applies
ALL
keyA keyB keyC keyD
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
EVEN
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
KEY
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
(DIST)STYLE decision time
Does the table participate in JOINs?
Can you tolerate additional
storage overhead?
Do the query patterns tolerate
reduced parallelism?
Does the table contain at least one
potential DISTKEY column?
Do query patterns utilize potential
DISTKEY columns in JOIN conditions?
Does the table contain at least one
potential DISTKEY column?
Table
DISTSTYLE EVEN DISTSTYLE KEY
DISTSTYLE ALL
Yes
Yes
Yes
Yes
No
No
No
Yes
Yes
The decision is only between
two at a time
If one valid DISTKEY column
exists then KEY or ALL
If no valid DISTKEY column
exists then EVEN or ALL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimizations with data lakes
Leverage
self-described
high-compression
Parquet files
Easily
perform
delta queries
and
“what changed
analysis”
from unloaded
snapshots of
Redshift tables
Use
partitions
but do not
over-partition
Lighten
compute
load
on Redshift
by using EMR
or Athena
ELT
from S3 to S3
using Spectrum
and UNLOAD
(make sure
to compress!)
1 2 3 4 5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Supporting actors
Batchy
Batch & state,
DAG execution
HAMBOT
Data quality &
monitoring
Teletraan1,
Robopager
Ops monitoring
Rundeck
Scheduling
Jenkins
Deployments
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How we do deployments
Jenkins
workflows
Spin up
ephemeral EMR clusters and Maximilian assets
Run
major transformations
Run
HAMBOT checks
Merge and deploy
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
V.I.N.CENT bot
Hero for our engineers
Allows ops interactions via Slack chat interface
Much easier for engineers over the console
Can start cluster in seconds
Reduces need for console access
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maximilian bot
Every hero needs a villain
Further ops interactions via Slack
But this is bot to bot communication
Seeks to destroy clusters twice a day
Humans intervene to fend off cluster destruction
Saves money on unused infrastructure
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Things are good!
Re-platformed and productionalized
2 apps in 4 months
Finished re-platform in under a year
Dependability – very few operational issues
Faster time-to-benefit via automated regression
Huge cost savings over Teradata
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Spreading the love
The new solution worked so well we built Blink
a new data platform too!
It only took 4 months to do the entire re-platforming
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lessons learned
Take advantage of S3/Redshift integration
Use an S3 first approach whenever possible
Develop an architecture that accommodates
change
One size doesn’t fit all – each tool serves a purpose
E.g. Sometimes it’s Redshift and other times it’s Redshift Spectrum!
Automate everything
Leverage automated tests & deployments to your
analytics environment
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cloud-forward strategy
Micro-service architecture
Gamification & metric driven programming
IoT
Connected cardio, beacons, wearables
Integrated single view of customer & advanced CRM
Machine learning
Recommendations, predictions, NLP, chatbots
Data platforming
Redshift, EMR/Spark, S3/Glue/Spectrum/Athena, Zeppelin Notebooks
We love innovation
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lisa Perazzoli
plisa@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019Randall Hunt
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudMark Kromer
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksKnoldus Inc.
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Migrating Oracle Databases to AWS
Migrating Oracle Databases to AWSMigrating Oracle Databases to AWS
Migrating Oracle Databases to AWSAWS Germany
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 

What's hot (20)

Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Migrating Oracle Databases to AWS
Migrating Oracle Databases to AWSMigrating Oracle Databases to AWS
Migrating Oracle Databases to AWS
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 

Similar to Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics Practices (ANT202-R) - AWS re:Invent 2018

BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)Amazon Web Services
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Amazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Amazon Web Services
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureAmazon Web Services
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018Amazon Web Services
 
Choose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelChoose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelAmazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Amazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Amazon Web Services
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Amazon Web Services
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your SolutionsAmazon Web Services
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSAmazon Web Services
 

Similar to Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics Practices (ANT202-R) - AWS re:Invent 2018 (20)

BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
Choose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelChoose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day Israel
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics Practices (ANT202-R) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Your Analytics Practices A N T 2 0 2 - R Ryan Kelly Data Architect Equinox Elliott Cordo VP Data Analytics Equinox Lisa Perazzoli Sr. Product Manager Amazon Web Services
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Raise your hand if you’re using Amazon Redshift © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Your requirements are evolving Data variety and data volumes are increasing rapidly Integrate Disparate data sets Democratized access to data in a governed way Analytic needs are evolving beyond batch reports to Real-time and predictive Incorporation of Voice, image recognition, and IoT use cases into applications
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Traditionally, analytics used to look like this Data Warehouse LOBCRMERPOLTP Business Intelligence Relational data GBs-TBs scale Schema defined prior to data load Operational reporting and ad hoc Large initial capex+ $10k–$50k/TB/year
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data every 5 years There is more data than people think. years live for Data platforms need to scalegrows
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. There are more data types than ever before.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hadoop Elasticsearch There are more ways to analyze data than ever before Years ago 11 8 5 4 Presto Spark Didn’t exist
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What does data warehouse modernization mean? Easy to use Extends to your Data Lake Don’t waste time on menial administrative tasks and maintenance Directly analyze data stored in your data lake in open formats Any scale of data, workloads, and users Dynamically scale up to guarantee performance even with unpredictable demands and data volumes Faster time-to-insights Consistently fast performance, even with thousands of concurrent queries and users
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Fastest Get faster time-to-insight for all types of analytics workloads; powered by machine learning, columnar storage and MPP Unlimited scale Extends your Data Lake 1/10th the cost Dynamically scale up to guarantee performance even with unpredictable analytical demands and data volumes Analyze data in the Amazon S3 Data Lake in-place and in open formats, together with data loaded into Redshift’s high performance SSDs Start at $0.25 per hour, save costs with automated administration tasks and eliminate business impact due to downtime; as low as $1,000 per terabyte per year Fast, simple, cost-effective data warehouse that can extend queries to your Data Lake Analyze data in open formats such as Parquet, ORC, and JSON, using SQL tools
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. for their cloud data warehouse workloads than anyone else Amazon Redshift
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Selected Amazon Redshift Partners Data Integration Business Intelligence Systems Integrators
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Faster Time to Insights Normalized Queries Per Hour (QPH) Assuming Redshift’s QPH 6 months ago=100% Queriesperhour Asa%ofredshift6monthsago Higher is better 100% 181% 237% 284% 350% >3x faster Faster performance New!
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift Query Editor Query data directly from the AWS console Results are instantly visible within the console No need to install and setup an external JDBC/ODBC client Launched in October!
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift Advisor © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. >96% of clusters have tailored feedback Provides automated recommendations to help optimize database performance and decrease operating costs Actionable WLM COPY, storage, and system maintenance advice for tuning based on continuous workload analysis Intelligent recommendations Launched in July!
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift intelligent maintenance VacuumAnalyze WLM Concurrency Setting AutoAuto Auto Maintenance processes like vacuum and analyze will automatically run in the background. Redshift will automatically adjust the WLM concurrency setting to deliver optimal throughput. Moving towards zero-maintenance. Coming Soon!
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Elastic Resize (GA) Adds additional nodes to Redshift cluster Run queries faster in busy periods Minimal transition time Scale compute and storage on- demand Scale up and down in minutes New! Redshift Cluster Computenodes Redshift Managed S3 JDBC/ODBC Leader Node CN2CN1 CN3 CN4 Backup
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Caching Layer Concurrency Scaling for bursts of user activity (Preview) Automatically creates more clusters on- demand Consistently fast performance even with thousands of concurrent queries No advance hydration required Free for >97% of customers for every 24 hours that your main cluster is in use, you accrue a one-hour credit for Concurrency Scaling New! Backup Amazon Redshift Managed S3
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data warehouse modernization is also about the transition to data lakes Data Warehouse Data Lake OLTP ERP CRM LOB Business Intelligence Devices Web Sensors Social Machine Learning Data Catalog DW Queries Big data processing Interactive analysis Real-time insights
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The power of data lakes Most ways to bring data in Terabyte – Exabyte scale Security, compliance, and audit capabilities Run any analytics on the same data without movement Scale storage and compute independently Designed for low-cost storage and analytics Redshift EMR Athena AI Services ElasticsearchKinesis Snowball Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Snowmobile
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Layers of a Data Lake INGEST Security S3 Analyze & infer Redshift EMR Athena AI Services Elasticsearch Service Kinesis Discover AWS Glue Snowball Snowmobile Kinesis Data Firehose Kinesis Data Streams Kinesis Data Streams Database Migration Service Ingest
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Query the same data with the best analytics tool for the job Data Lake on AWS Redshift EMR Athena Kinesis Sage Maker The importance of open data formats and open APIs Eliminates data silos and tool lock-ins Unified access and governance Platform decisions are long-lived. Innovation in analytics is high.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Modernizing your data warehouse includes Extending data warehouse queries into the data lake Sizing the data warehouse independent of the data lake Support for open data formats Integration with a variety of analytical tools in the data lake Scalability Unified access and governance A solution that will last for the next 10+ years
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Who we are Is a company with integrated luxury and lifestyle offerings centered on movement, nutrition, and regeneration we operate more than 200 locations within every major city across the country in addition to London and Canada
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How hard could it be? People check into the clubs? Members lift weights and put them down? Building neighbors feel shaking from heavy weights?
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. More than meets the eye Many lines of business across 98 clubs & 200+ in total Plus central supporting functions Digital Products CRM Marketing Creative Development / Building Finance Member’s Services Maintenance Personal training Pilates Spa Group Fitness Membership/ Sales Retail Food Services
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Digital Products End user applications Connections to Apple Health Connected Equipment Pursuit (gamified cycling experience) Cardio Digital Assessment Location Tracking Connected Tech
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The history of data First there was “LIFE”… This was Equinox’s first data warehouse and was created in 2008
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The history of data Rigorously Kimball
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. LIFE was good… Reporting was reliable Analytics, sometimes self-serviced! Customer Profiles CRM Email Marketing Personalization
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. But sometimes it was bad… Direct integration with applications, tight coupling Difficult SDLC, testing cycle, release management Functional debt No place to put NEW data In-flexibility for Data Science Expensive commercial software FML
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Version 1.X About 4 years ago we purchased Launched several apps running in beta Very expensive Limitations with integrations Required platform-specific knowledge
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Re-centering on our goals Provide business value Build technology that differentiates Reduce cost and go all-in on cloud technology Adopt modern engineering principles Make scalable components Use ephemeral, stateless resources Use distributed databases Less focus on individual servers
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Doesn’t work for everything! Just put everything in Hadoop or Amazon S3 data lake You don’t need a data warehouse Everything can just be late bind The “new” school
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data warehouse vs. data lake Data Lakes Data Warehouse Reliable high SLA reporting Developer and analyst friendly Efficient for specific types of pipelines Large immutable data sets Semi-structured and unstructured data sets
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Project “Cosmo” Two week proof-of-concept Re-platformed one Teradata app It worked! Amazon Redshift Amazon S3
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bidding farewell Au revoir (Not for sale anymore)
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. “JARVIS” is born Data Warehouse Data Lakes Data Services From successful POC to new data platform JARVIS Amazon Redshift Amazon S3
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. JARVIS architecture Data & Analytics Apps Equinox Apps Third Party Apps Informatica Maximilian EMR PT App Pursuit Engage Exact Target Adobe Social MOSO Fitness Agg. Amazon Redshift
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift benefits Cost effective 1/10th the cost of Teradata and SQL Server licensing and maintenance Low barriers for developers & easy to maintain Much less platform specific knowledge Fast and performant data pipelines reduced from hours to minutes Devops friendly API, automation, multi-cluster Integration with other AWS services and third party tools Amazon Redshift
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sailing on the data lake High performance, low cost, blob storage on S3 Functioning analytic store (not a dumping ground) Flexible, late bind strategies where appropriate Quick setup for external tables Easily implement DR strategies
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data in the lake Clickstream data PURSUIT cycling logs Club management software logs Data from software that enhances our services
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Big picture Query the data Immutable app log data - Adobe Analytics Toolkit Amazon Redshift Amazon EMR Amazon Athena Metadata AWS Glue Storage Amazon S3
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data ingestion Adobe Analytics data feeds Functionality built-in Choose columns to receive Specify AWS credentials and Amazon S3 information Get files daily Not so fast… Multiple files are then sent to Amazon S3 including multiple data files, multiple lookup files, and a manifest file describing everything sent
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Landing in the data lake Throw all raw files into an S3 landing bucket Use Amazon EMR to aggregate into single file 2 Save new parquet file to S3 data lake bucket 1 3 Save clean data to a sub-folder named “dt=YYYY-MM-DD” Partitioning data in separate folders allows for less data to be scanned Extra Credit!
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Setting AWS Glue up, part 1 Cleaned data is now in Amazon S3 but it can’t be queried yet Data must be described in AWS Glue Create a database in Glue to label the data source Create an external table in Glue interface 1 2
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Setting AWS Glue up, part 2 Set up external table in AWS Glue interface: External tables can also be created in Athena or Redshift Run Create External Table Select Add Table manually Point table data source to S3 folder location Define schema Define “dt” as an addition column for partition 1 2 3 4
  • 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Further describing the lake
  • 54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The assembled pipeline Adobe Analytics EMR AthenaS3 Glue Data Catalog Redshift Spectrum S3
  • 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. But wait… why ALTER TABLE a = Athena() partitions = [ {‘dt’ : ‘2017-11-20’ , ‘facility-code’ : ‘716’}, {‘dt’ : ‘2017-11-20’ , ‘facility-code’ : ‘715’}, {‘dt’ : ‘2017-11-20’ , ‘facility-code’ : ‘714’} ] a.repair_table(db_name=‘cyclingops’, table=‘cycling_logs’, partitions=partitions) Partitioned tables must be told about new data If it is not made aware then it cannot be queried Alter table easily with Glue crawler or Athena We built an Athena interaction class in Python for flexibility On successful EMR job we use this class to repair the table
  • 56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Processing on Redshift Light transformation via ELT scripts Happen inside of Redshift Orchestrated by Maximilian Big crunches and semi-structured data processing Happen outside of Redshift Help reserve query capacity
  • 58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What do our models look like Flattened Redshift is columnar so wide tables are A-OK! Distributed joins can be expensive Rational and conservative use of dimensions especially “Type 2” Somewhat like star schemas Basically, get answer and put in table!
  • 59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. f_checkin checkin_id member_id accounting_cd terminal_name contact_key facility_key checkin_type_desc checkin_status checkin_issue_reasons checkin_date_key checkin_time_key checkin_ts_time checkin_raw_count checkin_unique_count checkin_good_count checkin_good_daily_counter trial_checkin_count etl_source_system_cd etl_row_create_dts etl_row_update_dts etl_run_id Sample data model 1. Don’t make dimensions you don’t have to 2. No “junk” dimensions 3. No “mystery” or flag dimensions d_contact contact_key contact_id …. d_facility contact_key contact_id …. 1 2 3
  • 60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fall (DIST)STYLE fashions DISTSTYLE ALL Each node receives complete table Reduces disk usage on small-medium size tables Preferred for table sizes up to 3M rows with slow changing data DISTSTYLE KEY Each node receives portion of data via chosen key Optimizes JOIN, INSERT INTO, GROUP BY performance DISTSTYLE EVEN Each node receives portion of data via round robin Use if neither option above applies ALL keyA keyB keyC keyD Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 KEY Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4
  • 61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. (DIST)STYLE decision time Does the table participate in JOINs? Can you tolerate additional storage overhead? Do the query patterns tolerate reduced parallelism? Does the table contain at least one potential DISTKEY column? Do query patterns utilize potential DISTKEY columns in JOIN conditions? Does the table contain at least one potential DISTKEY column? Table DISTSTYLE EVEN DISTSTYLE KEY DISTSTYLE ALL Yes Yes Yes Yes No No No Yes Yes The decision is only between two at a time If one valid DISTKEY column exists then KEY or ALL If no valid DISTKEY column exists then EVEN or ALL
  • 62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Optimizations with data lakes Leverage self-described high-compression Parquet files Easily perform delta queries and “what changed analysis” from unloaded snapshots of Redshift tables Use partitions but do not over-partition Lighten compute load on Redshift by using EMR or Athena ELT from S3 to S3 using Spectrum and UNLOAD (make sure to compress!) 1 2 3 4 5
  • 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 64. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Supporting actors Batchy Batch & state, DAG execution HAMBOT Data quality & monitoring Teletraan1, Robopager Ops monitoring Rundeck Scheduling Jenkins Deployments
  • 65. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How we do deployments Jenkins workflows Spin up ephemeral EMR clusters and Maximilian assets Run major transformations Run HAMBOT checks Merge and deploy
  • 66. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. V.I.N.CENT bot Hero for our engineers Allows ops interactions via Slack chat interface Much easier for engineers over the console Can start cluster in seconds Reduces need for console access
  • 67. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maximilian bot Every hero needs a villain Further ops interactions via Slack But this is bot to bot communication Seeks to destroy clusters twice a day Humans intervene to fend off cluster destruction Saves money on unused infrastructure
  • 68. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 69. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Things are good! Re-platformed and productionalized 2 apps in 4 months Finished re-platform in under a year Dependability – very few operational issues Faster time-to-benefit via automated regression Huge cost savings over Teradata
  • 70. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Spreading the love The new solution worked so well we built Blink a new data platform too! It only took 4 months to do the entire re-platforming
  • 71. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lessons learned Take advantage of S3/Redshift integration Use an S3 first approach whenever possible Develop an architecture that accommodates change One size doesn’t fit all – each tool serves a purpose E.g. Sometimes it’s Redshift and other times it’s Redshift Spectrum! Automate everything Leverage automated tests & deployments to your analytics environment
  • 72. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 73. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cloud-forward strategy Micro-service architecture Gamification & metric driven programming IoT Connected cardio, beacons, wearables Integrated single view of customer & advanced CRM Machine learning Recommendations, predictions, NLP, chatbots Data platforming Redshift, EMR/Spark, S3/Glue/Spectrum/Athena, Zeppelin Notebooks We love innovation
  • 74. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lisa Perazzoli plisa@amazon.com
  • 75. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.