More Related Content Similar to BI & Analytics (20) More from Amazon Web Services (20) BI & Analytics1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BI & Analytics - A Datalake on AWS
Johan Broman
Manager, Solutions Architecture
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today's conversation
Business drivers for a Data Lake
Designing and building
Production use cases
3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Drives Better Decision
Making
4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Outcome 1 : Modernize and consolidate
• Insights to enhance business applications and create new digital services
Outcome 2 : Innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3 : Real-time engagement
• Interactive customer experience, event-driven automation, fraud detection
Outcome 4 : Automate for expansive reach
• Automation of business processes and physical infrastructure
Business Outcomes on a Modern Data Architecture
5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Legacy Data Architectures Exist as Isolated Data Silos
Hadoop
Cluster
SQL
Database
Data
Warehouse
Appliance
8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges with Legacy Data Architectures
• Can’t move data across silos
• Can’t deal with dynamic data and real-time processing
• Can’t deal with format diversity and change rate
• Complex ETL processes
• Difficult to find the people adequate skills to configure and
manage these systems
• Can’t integrate with the explosion of available social and
behavior tracking data
9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Legacy Data Architectures Are Monolithic
Multiple layers of
functionality all on a single
cluster
CPU
Memory
HDFS Storage
CPU
Memory
HDFS Storage
CPU
Memory
HDFS Storage
Hadoop Master Node
10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Enter Data Lake Architectures
Data Lake is a new and increasingly
popular architecture to store and analyze
massive volumes and heterogeneous
types of data.
11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake – All Data in One Place
Store and analyse all of your data,
from all of your sources, in one
centralised location.
“My data distributed in many
locations. Where is the single
source of truth?”
12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake – Quick Ingest
Quickly ingest data
without needing to force it into a
pre-defined schema.
“How can I collect data quickly
from various sources and store
it efficiently?”
13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake – Storage vs Compute
Separating your storage and compute
allows you to scale each component
as required
“How can I scale up with the
volume of data being generated?”
14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of a Data Lake – Schema on Read
“Is there a way I can apply multiple
analytics and processing frameworks
to the same data?”
A Data Lake enables ad-hoc
analysis by applying schemas
on read, not write.
15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today's conversation
Business drivers for a Data Lake
Designing and building
Production use cases
16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data
scientists
Automation /
events
Business
users
Data
analysts
Engagement
platforms
1. More personas need access to data, through appropriate tools
2. More systems need to link to data for decision and process automation
3. Users need to be able to find information, and access it securely
Expanding access requirements
17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Data must be captured from diverse sources at speed and scale
2. Data needs to be pulled together, breaking down traditional silos
3. Benefits need to far outweigh the costs of collection and analysis
Transactions ERP Connected
devices
Social mediaWeb logs /
cookies
Exponential growth of business data
18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Important Components of a Data Lake
Catalogue
& Search
Protect
& Secure
Access &
User Interface Ingest & Store
19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Approach to Data Lakes
20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
S3 is the Data Lake
21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Designed for 11 9s
of durability
Designed for
99.99% availability
Durable Available High performance
Multipart upload
Range GET
Store as much as you need
Scale storage and compute
independently
No minimum usage
commitments
Scalable
Amazon Redshift / Spectrum
Amazon EMR
Amazon Athena
Amazon DynamoDB
Integrated
Simple REST API
AWS SDKs
Read-after-create consistency
Event notification
Lifecycle policies
Easy to use
Why Amazon S3 for the Data Lake?
22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security
Identity and Access
Management (IAM) policies
Bucket policies
Access Control Lists (ACLs)
Private VPC endpoints to
Amazon S3
Pre-signed S3 URLs
Encryption
SSL endpoints
Server Side Encryption
(SSE-S3)
S3 Server Side
Encryption with
provided keys (SSE-C,
SSE-KMS)
Client-side Encryption
Audit & Compliance
Buckets access logs
Lifecycle Management
Policies
Versioning & MFA
deletes
Certifications – HIPAA,
PCI, SOC 1/2/3 etc.
Implement the right cloud security controls
23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Ingestion into S3
AWS Direct Connect
AWS SnowballISV Connectors
Amazon Kinesis
Firehose
AWS Storage
Gateway
S3 Transfer
Acceleration
24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena: Interactive Analysis
$ SQL
Query Instantly
Zero setup cost;
just point to
Amazon S3 and
start querying.
Pay per query
Pay only for queries run;
save 30–90% on per-
query costs through
compression.
Open
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types.
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with Amazon
QuickSight.
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
QuickSight Overview
Integrated with AWS - Redshift, RDS, Athena, S3,
IAM, Roles, CloudTrail and more
Cloud Native - Fully managed, serverless analytics at
scale
Super Fast and Easy to Use - Backed by SPICE and
a beautiful UI
Cost Effective - Starts at $9 per user per month
26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Putting it all together…
27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary of AWS Analytics, Database & AI Tools
Amazon Redshift
Enterprise Data Warehouse
Amazon EMR
Hadoop/Spark
Amazon Athena
Clusterless SQL
Amazon Glue
Clusterless ETL
Amazon Aurora
Managed Relational Database
Amazon Machine Learning
Predictive Analytics
Amazon Quicksight
Business Intelligence/Visualization
Amazon ElasticSearch Service
ElasticSearch
Amazon ElastiCache
Redis In-memory Datastore
Amazon DynamoDB
Managed NoSQL Database
Amazon Rekognition
Deep Learning-based Image Recognition
Amazon Lex
Voice or Text Chatbots
28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Queries Against an Amazon S3 Data Lake
29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Event-driven ETL Pipelines
30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building a Data Lake on AWS
Kinesis Firehose
Athena
Query Service
31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Solution Builder - Data Lake on AWS
Reference Architecture deployment
via CloudFormation
Configures core services to tag,
search and catalogue datasets
Deploys a console to search and
browse available datasets
http://amzn.to/2nTVjcp
32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing & Analytics
Real-time Batch
AI & Predictive
BI & Data Visualization
Transactional &
RDBMS
AWS Lambda
Apache Storm
on EMR
Apache Flink
on EMR
Spark Streaming
on EMR
Elasticsearch
Service
Kinesis Analytics,
Kinesis Streams
DynamoDB
NoSQL DB Relational Database
Aurora
EMR
Hadoop, Spark,
Presto
Redshift
Data Warehouse
Athena
Query Service
Amazon Lex
Speech
recognition
Amazon
Rekognition
Amazon Polly
Text to speech
Machine Learning
Predictive analytics
Kinesis Streams
& Firehose
33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today's conversation
Business drivers for a Data Lake
Designing and building
Production use cases
34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“For our market
surveillance systems, we
are looking at about 40%
[savings with AWS], but
the real benefits are the
business benefits: We
can do things that we
physically weren’t able to
do before, and that is
priceless.”
- Steve Randich, CIO
Case Study: Re-architecting Compliance
What FINRA needed
• Infrastructure for its market surveillance platform
• Support of analysis and storage of approximately 75
billion market events every day
Why they chose AWS
• Fulfillment of FINRA’s security requirements
• Ability to create a flexible platform using dynamic
clusters (Hadoop, Hive, and HBase), Amazon EMR,
and Amazon S3
Benefits realized
• Increased agility, speed, and cost savings
• Estimated savings of $10-20m annually by using AWS
35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Solution Builder - Data Lake on AWS
Reference Architecture deployment
via CloudFormation
Configures core services to tag,
search and catalogue datasets
Deploys a console to search and
browse available datasets
http://amzn.to/2nTVjcp
36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!