SlideShare a Scribd company logo
1 of 30
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Giorgio Nobile
Solutions Architect, Amazon Web Services
Soluzioni Data Lake: salvare, catalogare ed
analizzare tutti i vostri dati
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Defining the AWS data lake
Data lake is an architecture with a virtually
limitless centralized storage platform capable of
categorization, processing, analysis, and
consumption of heterogeneous data sets
Key data lake attributes
• Decoupled storage and compute
• Rapid ingest and transformation
• Secure multi-tenancy
• Query in place
• Schema on read
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Traditionally, Analytics Used to Look Like This
OLTP ERP CRM LOB
Data Warehouse
Business Intelligence Relational data
TBs-PBs scale
Schema defined prior to data load
Operational reporting and ad hoc
Large initial capex + $10K–$50K / TB / Year
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
Relational and non-relational data
TBs-EBs scale
Schema defined during analysis
Diverse analytical engines to gain insights
Designed for low-cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
1001100001001010111001
0101011100101010000101
1111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW Queries Big data
processing
Interactive Real-time
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What can you do with a data lake?
Amazon
Glacier
Amazon
S3
Amazon Redshift
Data Warehouse
Amazon EMR
Clusterless SQL Query
Amazon Athena
Clusterless ETL
Amazon Glue
BI & Visualization
Hadoop/Hive/Presto
Batch processing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What can you do with a data lake?
Amazon
Glacier
Amazon
S3
Streaming and real-time analytics
AWS Lambda
Amazon
Elasticsearch
Service
Apache Storm
on EMR
Apache Flink
on EMR
Amazon Kinesis
Analytics
Spark Streaming
on EMR
Amazon
ElastiCache
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What can you do with a data lake?
Amazon
Glacier
Amazon
S3
AI and machine learning
Life-like speech
Amazon Polly
Amazon Lex
Conversational
engine
Amazon Rekognition
Image analysis
Deep learning
Frameworks
MXNet, TensorFlow,
Theano, Caffe, Torch
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Lakes on AWS
Unmatched durability and availability at Exabyte scale
Comprehensive security, compliance, and audit capabilities
Object-level controls
Usage and cost analysis insight into your data
Most ways to bring data in
Twice as many partner integration
DATA LAKE
A m a z o n S 3
A m a z o n G l a c i e r
A W S G l u e
Machine Learning
Analytics
Internet of Things
Snowball
Snowmobile Kinesis
Data Firehose
Kinesis
Data Streams
Kinesis
Video Streams
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimize costs with data tiering
Hot
Cold
Amazon
S3 standard
Amazon S3—
infrequent access
Amazon
Glacier
HDFS  Use EMR/Hadoop with local
HDFS for hottest data sets
 Store cooler data in S3 and
Glacier to reduce costs
 Use S3 Analytics to optimize
tiering strategy
S3 Analytics
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Streaming with Amazon Kinesis
Easily collect, process, and analyze data and video streams in real time
Capture, process, and
store video streams
Kinesis Video Streams
Load data streams into
AWS data stores
Analyze data streams with
SQL
Capture, process, and
store data streams
Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Kinesis Data Streaming
Collect, process, and analyze data streams in real time
EMR/Spark
Custom code
on EC2
Amazon S3
Amazon
Redshift
Splunk
Ingest,
store data
streams
Kinesis Data
Streams
Kinesis Data
Analytics
Aggregate,
filter, enrich
data
Kinesis Data
Firehose
Egress
data
streams
AWS Lambda
Real time
Fully managed
Scalable
Secure
Cost effective
Amazon
Elasticsearch
Service
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Kinesis Data Firehose
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Kinesis Data Analytics
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Kinesis is a Foundational Service Used
Across Amazon
Amazon
CloudWa
tch
logs
Amazon
S3
events
AWS
metering
Amazon.com
online catalog
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Use Case 1: Clickstream Analytics
Example: Website content recommendations
Streams website
clickstreams for
analytics
Aggregates clickstreams
based on user sessions
and calculates site
metrics
Loads aggregated
metrics to Amazon
Redshift
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Use Case 2: Real-time Analytics
Example: Analyze streaming social media data
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Storing is not enough, data needs to be discoverable
Dark data are the information
assets organizations collect,
process, and store during
regular business activities,
but generally fail to use for
other purposes (for example,
analytics, business relationships
and direct monetizing).
Gartner
CRM ERP Data warehouse Mainframe
data
Web Social Log
files
Machine
data
Semi-
structured
Unstructured
“
”
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
AWS Glue—data catalog
Make data discoverable
Automatically discovers data and stores schema
Catalog makes data searchable, and available for ETL
Catalog contains table and job definitions
Computes statistics to make queries efficient
Compliance
Glue
Data Catalog
Discover data and
extract schema
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data preparation accounts for ~80% of the work
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
AWS Glue—ETL service
Make ETL scripting and deployment easy
Automatically generates ETL code
Code is customizable with Python and Spark
Endpoints provided to edit, debug, test code
Jobs are scheduled or event-based
Serverless
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon EMR—Big Data Processing
$
Latest versions
Updated with the latest
open source frameworks
within 30 days of release
Low cost
Flexible billing with per-
second billing, EC2 spot,
reserved instances and
auto-scaling to reduce
costs 50–80%
Use S3 storage
Process data directly in
the S3 data lake securely
with high performance
using the EMRFS
connector
Easy
Launch fully managed
Hadoop & Spark in minutes;
no cluster setup, node
provisioning, cluster tuning
Data Lake
1001100001001010111001
010101110010101000
00111100101100101
010001100001
Analytics and ML at scale
19 open-source projects: Apache Hadoop, Spark, HBase, Presto, and more
Enterprise-grade security
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Athena—Interactive Analysis
$
SQL
Query Instantly
Zero setup cost; just
point to S3 and start
querying
Pay per query
Pay only for queries run;
save 30–90% on per-query
costs through compression
Open
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with QuickSight
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Redshift – Modern Data Warehousing
Fast, scalable, fully managed data warehouse at 1/10th the cost
Massively parallel, scales from gigabytes to exabytes
Queries data across your Redshift data warehouse and Amazon S3 data lake
Fast at scale
Columnar storage technology
to improve I/O efficiency and
scale query performance
Cost-effective
Start at $0.25 per hour; as
low as $250-$333 per
uncompressed terabyte
per year
Open file formats Secure
Audit everything; encrypt
data end-to-end; extensive
certification and compliance
Analyze optimized data
formats on direct-attached
disks, and all open file
formats in S3
$
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Redshift Spectrum – Data Lake Analytics
Query across your Amazon Redshift data warehouse and your Amazon S3 data lake
Run Redshift SQL queries against Amazon S3
Scale compute and storage separately
Fast query performance
Unlimited concurrency
CSV, ORC, Grok, Avro & Parquet data formats
On demand, pay per query based on data scanned
S3 data lakeRedshift data
Redshift Spectrum
query engine
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Redshift Cluster Architecture
Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, unload, backup, restore
• 2, 16 or 32 slices
Redshift Spectrum
• In-place queries of data on Amazon S3
• Ultra high scale, unlimited concurrency
• CSV, Grok, Avro, Parquet, and more
Redshift Cluster
JDBC/ODBC
...
1 2 3 4 N
Leader Node
Compute Nodes
Spectrum Fleet
Amazon S3
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sysco is the leader in selling, marketing, & distributing food
Challenge:
Large volumes of data in multiple systems. Also, high costs
from maintaining on-premises EDW deployment
Solution:
Migrated their on-premises solution to the cloud with
Redshift, S3, EMR, and Athena
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Sysco—Analytics on the Data Lake
ETL
process
Redshift
Data
preparation
Ingest raw data
from multiple
sources
S3Marketing
data source
Other
source
systems
Transformed
data
S3
Redshift
Spectrum
Athena
EMR
Sysco is the leader in selling, marketing, & distributing food
Challenge: large volumes of data in multiple systems
Consolidated data into a single S3 data lake
Data scientists use EMR notebooks, Athena & Amazon Redshift
Spectrum used by business users for reporting
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Nasdaq Uses Amazon Redshift for Fast Queries
Migrate legacy on-premises warehouse to Redshift
4.8B rows inserted per trading day (orders, trades,
quotes)
Ingest data from multiple sources, validates, and
stages in Amazon S3
Redshift reads data out of S3 for fast queries
Presto on EMR and S3 used for analysis of massive
historical data set
Redshift
Flat files
Operational
Databases
S3
EMR
Data from all 7 exchanges operated by
Nasdaq (orders, quotes, trade executions)
SQL Clients
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018Amazon Web Services
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...Amazon Web Services
 
How Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsHow Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsAmazon Web Services
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Amazon Web Services
 
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018Amazon Web Services
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Amazon Web Services
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Web Services
 

What's hot (20)

Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
 
How Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsHow Amazon.com uses AWS Analytics
How Amazon.com uses AWS Analytics
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
 
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
How Amazon uses AWS Analytics
How Amazon uses AWS AnalyticsHow Amazon uses AWS Analytics
How Amazon uses AWS Analytics
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
 

Similar to Implementazione di una soluzione Data Lake.pdf

AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin BriskmanSameer Kenkare
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfSasikumarPalanivel3
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfsaidbilgen
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCAmazon Web Services LATAM
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Amazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018Amazon Web Services
 

Similar to Implementazione di una soluzione Data Lake.pdf (20)

AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 
Big Data@Scale
 Big Data@Scale Big Data@Scale
Big Data@Scale
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Implementazione di una soluzione Data Lake.pdf

  • 1. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Giorgio Nobile Solutions Architect, Amazon Web Services Soluzioni Data Lake: salvare, catalogare ed analizzare tutti i vostri dati
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Defining the AWS data lake Data lake is an architecture with a virtually limitless centralized storage platform capable of categorization, processing, analysis, and consumption of heterogeneous data sets Key data lake attributes • Decoupled storage and compute • Rapid ingest and transformation • Secure multi-tenancy • Query in place • Schema on read
  • 3. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Traditionally, Analytics Used to Look Like This OLTP ERP CRM LOB Data Warehouse Business Intelligence Relational data TBs-PBs scale Schema defined prior to data load Operational reporting and ad hoc Large initial capex + $10K–$50K / TB / Year
  • 4. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Lakes Extend the Traditional Approach Relational and non-relational data TBs-EBs scale Schema defined during analysis Diverse analytical engines to gain insights Designed for low-cost storage and analytics OLTP ERP CRM LOB Data Warehouse Business Intelligence Data Lake 1001100001001010111001 0101011100101010000101 1111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What can you do with a data lake? Amazon Glacier Amazon S3 Amazon Redshift Data Warehouse Amazon EMR Clusterless SQL Query Amazon Athena Clusterless ETL Amazon Glue BI & Visualization Hadoop/Hive/Presto Batch processing
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What can you do with a data lake? Amazon Glacier Amazon S3 Streaming and real-time analytics AWS Lambda Amazon Elasticsearch Service Apache Storm on EMR Apache Flink on EMR Amazon Kinesis Analytics Spark Streaming on EMR Amazon ElastiCache
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What can you do with a data lake? Amazon Glacier Amazon S3 AI and machine learning Life-like speech Amazon Polly Amazon Lex Conversational engine Amazon Rekognition Image analysis Deep learning Frameworks MXNet, TensorFlow, Theano, Caffe, Torch
  • 8. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Lakes on AWS Unmatched durability and availability at Exabyte scale Comprehensive security, compliance, and audit capabilities Object-level controls Usage and cost analysis insight into your data Most ways to bring data in Twice as many partner integration DATA LAKE A m a z o n S 3 A m a z o n G l a c i e r A W S G l u e Machine Learning Analytics Internet of Things Snowball Snowmobile Kinesis Data Firehose Kinesis Data Streams Kinesis Video Streams
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimize costs with data tiering Hot Cold Amazon S3 standard Amazon S3— infrequent access Amazon Glacier HDFS  Use EMR/Hadoop with local HDFS for hottest data sets  Store cooler data in S3 and Glacier to reduce costs  Use S3 Analytics to optimize tiering strategy S3 Analytics
  • 10. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Streaming with Amazon Kinesis Easily collect, process, and analyze data and video streams in real time Capture, process, and store video streams Kinesis Video Streams Load data streams into AWS data stores Analyze data streams with SQL Capture, process, and store data streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics
  • 11. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Kinesis Data Streaming Collect, process, and analyze data streams in real time EMR/Spark Custom code on EC2 Amazon S3 Amazon Redshift Splunk Ingest, store data streams Kinesis Data Streams Kinesis Data Analytics Aggregate, filter, enrich data Kinesis Data Firehose Egress data streams AWS Lambda Real time Fully managed Scalable Secure Cost effective Amazon Elasticsearch Service
  • 12. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Kinesis Data Firehose
  • 13. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Kinesis Data Analytics
  • 14. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Kinesis is a Foundational Service Used Across Amazon Amazon CloudWa tch logs Amazon S3 events AWS metering Amazon.com online catalog
  • 15. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Use Case 1: Clickstream Analytics Example: Website content recommendations Streams website clickstreams for analytics Aggregates clickstreams based on user sessions and calculates site metrics Loads aggregated metrics to Amazon Redshift
  • 16. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Use Case 2: Real-time Analytics Example: Analyze streaming social media data
  • 17. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Storing is not enough, data needs to be discoverable Dark data are the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Gartner CRM ERP Data warehouse Mainframe data Web Social Log files Machine data Semi- structured Unstructured “ ”
  • 18. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. AWS Glue—data catalog Make data discoverable Automatically discovers data and stores schema Catalog makes data searchable, and available for ETL Catalog contains table and job definitions Computes statistics to make queries efficient Compliance Glue Data Catalog Discover data and extract schema
  • 19. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 20. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. AWS Glue—ETL service Make ETL scripting and deployment easy Automatically generates ETL code Code is customizable with Python and Spark Endpoints provided to edit, debug, test code Jobs are scheduled or event-based Serverless
  • 21. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon EMR—Big Data Processing $ Latest versions Updated with the latest open source frameworks within 30 days of release Low cost Flexible billing with per- second billing, EC2 spot, reserved instances and auto-scaling to reduce costs 50–80% Use S3 storage Process data directly in the S3 data lake securely with high performance using the EMRFS connector Easy Launch fully managed Hadoop & Spark in minutes; no cluster setup, node provisioning, cluster tuning Data Lake 1001100001001010111001 010101110010101000 00111100101100101 010001100001 Analytics and ML at scale 19 open-source projects: Apache Hadoop, Spark, HBase, Presto, and more Enterprise-grade security
  • 22. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Athena—Interactive Analysis $ SQL Query Instantly Zero setup cost; just point to S3 and start querying Pay per query Pay only for queries run; save 30–90% on per-query costs through compression Open ANSI SQL interface, JDBC/ODBC drivers, multiple formats, compression types, and complex joins and data types Easy Serverless: zero infrastructure, zero administration Integrated with QuickSight Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
  • 23. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Redshift – Modern Data Warehousing Fast, scalable, fully managed data warehouse at 1/10th the cost Massively parallel, scales from gigabytes to exabytes Queries data across your Redshift data warehouse and Amazon S3 data lake Fast at scale Columnar storage technology to improve I/O efficiency and scale query performance Cost-effective Start at $0.25 per hour; as low as $250-$333 per uncompressed terabyte per year Open file formats Secure Audit everything; encrypt data end-to-end; extensive certification and compliance Analyze optimized data formats on direct-attached disks, and all open file formats in S3 $
  • 24. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Redshift Spectrum – Data Lake Analytics Query across your Amazon Redshift data warehouse and your Amazon S3 data lake Run Redshift SQL queries against Amazon S3 Scale compute and storage separately Fast query performance Unlimited concurrency CSV, ORC, Grok, Avro & Parquet data formats On demand, pay per query based on data scanned S3 data lakeRedshift data Redshift Spectrum query engine
  • 25. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Redshift Cluster Architecture Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing Compute nodes • Local, columnar storage • Executes queries in parallel • Load, unload, backup, restore • 2, 16 or 32 slices Redshift Spectrum • In-place queries of data on Amazon S3 • Ultra high scale, unlimited concurrency • CSV, Grok, Avro, Parquet, and more Redshift Cluster JDBC/ODBC ... 1 2 3 4 N Leader Node Compute Nodes Spectrum Fleet Amazon S3
  • 26. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sysco is the leader in selling, marketing, & distributing food Challenge: Large volumes of data in multiple systems. Also, high costs from maintaining on-premises EDW deployment Solution: Migrated their on-premises solution to the cloud with Redshift, S3, EMR, and Athena
  • 28. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Sysco—Analytics on the Data Lake ETL process Redshift Data preparation Ingest raw data from multiple sources S3Marketing data source Other source systems Transformed data S3 Redshift Spectrum Athena EMR Sysco is the leader in selling, marketing, & distributing food Challenge: large volumes of data in multiple systems Consolidated data into a single S3 data lake Data scientists use EMR notebooks, Athena & Amazon Redshift Spectrum used by business users for reporting
  • 29. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Nasdaq Uses Amazon Redshift for Fast Queries Migrate legacy on-premises warehouse to Redshift 4.8B rows inserted per trading day (orders, trades, quotes) Ingest data from multiple sources, validates, and stages in Amazon S3 Redshift reads data out of S3 for fast queries Presto on EMR and S3 used for analysis of massive historical data set Redshift Flat files Operational Databases S3 EMR Data from all 7 exchanges operated by Nasdaq (orders, quotes, trade executions) SQL Clients
  • 30. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Thank you!