SlideShare a Scribd company logo
1 of 58
©  2016,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved.
Sebastien  Menant &  Nam  Je  Cho,  Enterprise  Solutions  Architects  
Amazon  Web  Services
Building  a  Server-­less  Data  Lake  on  AWS
Technical  301
Agenda
• What  is  a  Data  Lake?
• Why  You  Need  a  Data  Lake
• Building  the  Data  Lake
• Demo
• Next  Steps
What  is  a  Data  Lake?
Definition
“A data lake provides massive storage for
any kind of data, enormous processing
power and the ability to handle virtually
limitless concurrent tasks or jobs”
-­ Wikipedia
Characteristics  of  a  Data  Lake
Collect
Everything
Dive  in
Anywhere
Flexible
Access
Why  You  Need  a  Data  Lake
What  About  Modern  Business  Needs?
Big  Data… and  The  Hadoop  Ecosystem
But  Both  are  Complementary
Amazon  
EMR
Amazon  
Redshift
But  Both  are  Complementary
STORAGE
COMPUTE
COMPUTE
COMPUTE
COMPUTE
COMPUTE
COMPUTE
COMPUTE
COMPUTE
COMPUTE
Amazon  
EMR
Amazon  S3
New  Business  Outcomes  and  Capabilities
• Enable  New  Insights  in  Your  Data
• Cost  Savings  of  Compute  and  Storage
• Use  the  Right  Tool  for  the  Job
• Increase  Durability  of  Data
• Charge  Storage  Costs  to  Owner
• Streaming  and  Real-­time  Analysis
Retain  all  your  data,  for  years!
Building  the  Data  Lake
Beware
Building  Blocks  of  the  Data  Lake
Storage  and  Ingestion
Catalogue  and  Search
Security
API  and  UI
Storage  and  Ingestion
Storage  and  
Ingestion
Catalogue  and  
Search
Security
API  and  UI
Requirements  for  Storage
• Multi-­year  Scalable  Storage  Capability
• High  Durability
• Store  Raw  Data  from  Any  Input  Sources
• Support  for  Any  Data  Type
• Low  Cost
Amazon  S3
1. Highly  Scalable  and  Durable
2. Security  and  Encryption
3. Lifecycle  Management
4. Event  Notifications
5. Versioning
Key  Services  for  Storage
Amazon  Glacier
1. Long-­term  Archival  Storage
2. Lifecycle  Integration  with  S3
3. Extremely  Low-­cost
4. Vault  Lock
Amazon
S3
Amazon  
Glacier
Amazon  
S3
Amazon  
Glacier
Storage  
and
Ingestion
Recommendations  #1
• S3  Buckets
• Close  to  Users  and  Compute
• Select  Region  for  Regulatory  Compliance
• Naming
• Human-­readable  Path
• Random  Hash  Prefix  for  Optimal  Partitioning
• Format
• Structured  vs  Unstructured  +  Compression
• CSV,  Parquet,  ORC,  JSON,  XML,  logs,  etc
• GZIP  for  small  files,  Avro,  LZO,  Snappy
Recommendations  #2
• Optimise
• Store  Everything
• Use  Large  Files  with  Split-­able  Format
• Lifecycle  Policies  for  Cost-­savings
• Tagging  for  Cost  Allocation
• Security
• Encryption
• Bucket  Policies,  ACL, Tagging,  CloudTrail
Requirements  for  Ingestion
• Batch  File  Support
• Traditional  ETL
• Streaming  Data
• Consumption  of  any  Dataset  as  a  Stream
• Low  Latency  Analytics
• Replay-­ability  from  the  Data  Lake
• Server-­less  ETL  Capabilities
Amazon  Kinesis  Firehose
1. Easy  to  use  with  Agent
2. Automatic  Elasticity
3. Near  Real-­time
4. Simultaneous  Destinations
Key  Services  for  Ingestion
Amazon  Kinesis  Streams
1. Enables  Custom  Processing
2. Continuous  Data  Collection
3. Real-­time
4. API  Driven  for  Custom  Apps
Amazon  
Kinesis  
Streams
Amazon  
Kinesis  
Firehose
Data  
Sources
Data  
Sources
Data  
Sources
Data  
Sources
Data  
Sources
S3
DynamoDB
Redshift
Amazon Kinesis
Availability  
Zone
Availability  
Zone
Availability  
Zone
Stream
AWS  Lambda
KCL  App
EMR
Elasticsearch
Amazon  
Glacier
Amazon  
Kinesis
Storage  
and  
Ingestion
Amazon  
S3
Recommendations
• Reminder
• Added  Complexity  needs  Business  Justification
• Select  the  Right  Tools
• Real-­time  Analysis:  Apache  Spark  Streaming,  Storm,  Flink
• Firehose  to  Redshift  for  BI  and  Dashboards
• Tips
• AWS  Lambda  for  ETL  Transformation
• Persist  Streams  into  S3
http://amzn.to/23DWr5O
http://amzn.to/1SRk8wG
Catalogue  and  Search
Storage  and  
Ingestion
Catalogue  and  
Search
Security
API  and  UI
Requirements  for  Catalogue  and  Search
• Metadata  Index
• Automated  Metadata  Processing
• Discovery  and  Search
• Data  Classification
• Server-­less  and  Event-­driven
Key  Services  for  Catalogue  and  Search
1. Server-­less
2. Event  Driven
3. Auto  Scaling
4. Real-­time
1. NoSQL
2. Streams
3. Logstash Plugin
1. Deploy  Simply
2. Easy  Admin
3. Kibana
Amazon  
Elasticsearch
Service
Amazon
DynamoDB
AWS
Lambda
Lambda DynamoDB Elasticsearch
Catalogue  and  Search
AWS  
Lambda
Amazon
DynamoDB
Amazon
Elasticsearch
Recommendations
• Tips
• Start  Small  and  Simple… add  Capabilities
• File  names,  size,  state,  dates,  tags,  owner
• Region,  versions,  lineage,  relationships
• Search  Metadata  and  Object  Content
• Events
• S3  Triggers  Lambda
• DynamoDB Streams
• Logstash Plugin  to  Elasticsearch
http://amzn.to/23E9LUp
http://amzn.to/1TQVBwp
Security
Storage  and  
Ingestion
Catalogue  and  
Search
Security
API  and  UI
Requirements  for  Security
• Data  Encryption  at  Rest
• Authentication
• Authorisation
AWS  IAM
1. Users  and  Roles
2. Identity  Federation
3. Multi  Factor  Authentication
4. Granular  Permissions
Key  Services  for  Security
AWS  KMS
1. Seamless  Service  Integration
2. Extensive  Compliance
AWS  
IAM
AWS  
KMS
AWS
CloudHSM
SSE-­S3
Security
AWS  
KMS
AWS  
IAM
Recommendations
• Start  Early
• Security  Needs  Practice!
• Federate  with  your  Corporate  Directory
• Best  Practice
• Use  CloudTrail and  CloudWatch
• Encrypt  Where  Possible
• Select  Bucket  Region  for  Regulatory  Compliance
• Tips
• IAM  Policies,  S3  Versioning  and  MFA  Delete
• Lambda  for  Data  Masking
API  and  UI
Storage  and  
Ingestion
Catalogue  and  
Search
Security
API  and  UI
Requirements  for  API  and  UI
• Serve  Data  and  Capabilities  to  Customers
• Programmatically
• Search  Catalogue
• Run  Compute
• Extend  Access  Control  Management
• And…  Use  of  Familiar  Visualisation  Tools
Amazon  API  Gateway
1. Performance  at  Any  Scale
2. Create  RESTful  Frontend
3. Managed  API  Lifecycle
Key  Services  for  API  and  UI
AWS  Lambda
1. Enables  Server-­less  API
2. Custom  Logic  for  Services  
3. Automatic  Scaling
AWS
Lambda
Amazon  API  
Gateway
API  
and  
UI Amazon  
API  Gateway
AWS  
Lambda
Recommendations
• Tips
• Go  Server-­less!
• Extend  Existing  AWS  Services  and  Build  Custom  Logic
• Data  Management,  Processing  and  Transformations
• API  Gateway  for  Data  Access
• Serve  the  Data,  Search  and  Compute  via  RESTful  APIs
• Distribute  a  Custom  SDK
• Extend  the  Solution
• Build  Advanced  Security  Controls  using  Metadata  Index
The  Whole  Picture…
Storage  and  
Ingestion
Catalogue  and  
Search
Security
API  and  UI
Storage  and  
Ingestion
Catalogue  and  
Search
Security
API  and  UI
Amazon  
EMR
Amazon  
RDS
Amazon  
S3
Amazon  
Glacier
Amazon  
Kinesis
Storage  
and
Ingestion
Security
AWS  
KMS
AWS  
IAM
API
And
UI Amazon
API  Gateway
AWS  
Lambda USERS
Amazon  
Redshift
Catalogue  and  Search
AWS  
Lambda
Amazon
DynamoDB
Amazon
Elasticsearch
A  Data  Lake  is…
• Foundation  of  Data  Storage  and  Streaming  Data
• Metadata  index  to  help  Categorise  and  Govern  
• Search  Index  to  Enable  Data  Discovery
• Robust  Set  of  Security  Controls
• Governance  Through  Technology  Not  Policy
• Interface  to  Expose  Data  and  Capabilities  to  Users
©  2016,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved.
2016-­04-­28
Demo
Demo
Building  Catalogue  and  Search
ElasticSearch
Metadata
Index
LambdaS3  Bucket Logstash
Data  Flow
Data
Source
DynamoDB
Next  Steps
Proof  of  Concept
Next  Steps
• How  to  Get  Started
• AWS  Documentation
• Getting  Started  Guide
• AWS  Training  &  Certification
• Big  Data  on  AWS
• AWS  Partner  Network
• AWS  Professional  Services
• Big  Data  Specialists
AWS  Training  &  Certification
Intro  Videos  &  Labs  
Free  videos  and  labs  to  
help  you  learn  to  work  
with  30+  AWS  services  
– in  minutes!
Training  Classes
In-­person  and  online  
courses  to  build  
technical  skills  –
taught  by  accredited  
AWS  instructors
Online  Labs  
Practice  working  with  
AWS  services  in  live  
environment  –
Learn  how  related  
services  work  
together
AWS  Certification
Validate  technical  
skills  and  expertise  –
identify  qualified  IT  
talent  or  show  you  
are  AWS  cloud  ready
Learn  more:  aws.amazon.com/training
Your  Training  Next  Steps:
ü Visit  the  AWS  Training  &  Certification  pod  to  discuss  your  
training  plan  &  AWS  Summit  training  offer
ü Register  &  attend  AWS  instructor  led  training
ü Get  Certified
AWS  Certified?  Visit  the  AWS  Summit  Certification  Lounge  to  pick  up  your  swag
Learn  more:  aws.amazon.com/training
Thank  You!

More Related Content

What's hot

Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarAmazon Web Services
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Amazon Web Services
 
Building your first Data lake platform
Building your first Data lake platform Building your first Data lake platform
Building your first Data lake platform Amazon Web Services
 
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Amazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsAmazon Web Services
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 

What's hot (20)

Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 
AWS Data Collection & Storage
AWS Data Collection & StorageAWS Data Collection & Storage
AWS Data Collection & Storage
 
Building your first Data lake platform
Building your first Data lake platform Building your first Data lake platform
Building your first Data lake platform
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
 
Big data on aws
Big data on awsBig data on aws
Big data on aws
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
Securing Your Big Data on AWS
Securing Your Big Data on AWSSecuring Your Big Data on AWS
Securing Your Big Data on AWS
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics Workloads
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Amazon Kinesis Data Streams
Amazon Kinesis Data StreamsAmazon Kinesis Data Streams
Amazon Kinesis Data Streams
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 

Viewers also liked

AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services
 
SaaS Billing strategy - a check list
SaaS Billing strategy - a check listSaaS Billing strategy - a check list
SaaS Billing strategy - a check listLionel Anciaux
 
(ARC304) Designing for SaaS: Next-Generation Software Delivery Models on AWS ...
(ARC304) Designing for SaaS: Next-Generation Software Delivery Models on AWS ...(ARC304) Designing for SaaS: Next-Generation Software Delivery Models on AWS ...
(ARC304) Designing for SaaS: Next-Generation Software Delivery Models on AWS ...Amazon Web Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...Amazon Web Services
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 

Viewers also liked (7)

AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
SaaS Billing strategy - a check list
SaaS Billing strategy - a check listSaaS Billing strategy - a check list
SaaS Billing strategy - a check list
 
(ARC304) Designing for SaaS: Next-Generation Software Delivery Models on AWS ...
(ARC304) Designing for SaaS: Next-Generation Software Delivery Models on AWS ...(ARC304) Designing for SaaS: Next-Generation Software Delivery Models on AWS ...
(ARC304) Designing for SaaS: Next-Generation Software Delivery Models on AWS ...
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 

Similar to Building a Server-less Data Lake on AWS - Technical 301

AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAmazon Web Services
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream ProcessingLuis Gonzalez
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingBEEVA_es
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big DataAmazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Being Well Architected in the Cloud (Updated)
Being Well Architected in the Cloud (Updated)Being Well Architected in the Cloud (Updated)
Being Well Architected in the Cloud (Updated)Adrian Hornsby
 
Being Well-Architected in the Cloud
Being Well-Architected in the CloudBeing Well-Architected in the Cloud
Being Well-Architected in the CloudAmazon Web Services
 
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWSAWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWSAmazon Web Services
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017
Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017
Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017Amazon Web Services
 

Similar to Building a Server-less Data Lake on AWS - Technical 301 (20)

AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Being Well Architected in the Cloud (Updated)
Being Well Architected in the Cloud (Updated)Being Well Architected in the Cloud (Updated)
Being Well Architected in the Cloud (Updated)
 
Being Well-Architected in the Cloud
Being Well-Architected in the CloudBeing Well-Architected in the Cloud
Being Well-Architected in the Cloud
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWSAWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017
Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017
Deploying a Data Lake in AWS - AWS Summit Tel Aviv 2017
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Building a Server-less Data Lake on AWS - Technical 301

  • 1. ©  2016,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Sebastien  Menant &  Nam  Je  Cho,  Enterprise  Solutions  Architects   Amazon  Web  Services Building  a  Server-­less  Data  Lake  on  AWS Technical  301
  • 2. Agenda • What  is  a  Data  Lake? • Why  You  Need  a  Data  Lake • Building  the  Data  Lake • Demo • Next  Steps
  • 3. What  is  a  Data  Lake?
  • 4. Definition “A data lake provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs” -­ Wikipedia
  • 5. Characteristics  of  a  Data  Lake Collect Everything Dive  in Anywhere Flexible Access
  • 6. Why  You  Need  a  Data  Lake
  • 7.
  • 8. What  About  Modern  Business  Needs?
  • 9. Big  Data… and  The  Hadoop  Ecosystem
  • 10. But  Both  are  Complementary Amazon   EMR Amazon   Redshift But  Both  are  Complementary
  • 11.
  • 13. New  Business  Outcomes  and  Capabilities • Enable  New  Insights  in  Your  Data • Cost  Savings  of  Compute  and  Storage • Use  the  Right  Tool  for  the  Job • Increase  Durability  of  Data • Charge  Storage  Costs  to  Owner • Streaming  and  Real-­time  Analysis Retain  all  your  data,  for  years!
  • 16. Building  Blocks  of  the  Data  Lake Storage  and  Ingestion Catalogue  and  Search Security API  and  UI
  • 17. Storage  and  Ingestion Storage  and   Ingestion Catalogue  and   Search Security API  and  UI
  • 18. Requirements  for  Storage • Multi-­year  Scalable  Storage  Capability • High  Durability • Store  Raw  Data  from  Any  Input  Sources • Support  for  Any  Data  Type • Low  Cost
  • 19. Amazon  S3 1. Highly  Scalable  and  Durable 2. Security  and  Encryption 3. Lifecycle  Management 4. Event  Notifications 5. Versioning Key  Services  for  Storage Amazon  Glacier 1. Long-­term  Archival  Storage 2. Lifecycle  Integration  with  S3 3. Extremely  Low-­cost 4. Vault  Lock Amazon S3 Amazon   Glacier
  • 21. Recommendations  #1 • S3  Buckets • Close  to  Users  and  Compute • Select  Region  for  Regulatory  Compliance • Naming • Human-­readable  Path • Random  Hash  Prefix  for  Optimal  Partitioning • Format • Structured  vs  Unstructured  +  Compression • CSV,  Parquet,  ORC,  JSON,  XML,  logs,  etc • GZIP  for  small  files,  Avro,  LZO,  Snappy
  • 22. Recommendations  #2 • Optimise • Store  Everything • Use  Large  Files  with  Split-­able  Format • Lifecycle  Policies  for  Cost-­savings • Tagging  for  Cost  Allocation • Security • Encryption • Bucket  Policies,  ACL, Tagging,  CloudTrail
  • 23. Requirements  for  Ingestion • Batch  File  Support • Traditional  ETL • Streaming  Data • Consumption  of  any  Dataset  as  a  Stream • Low  Latency  Analytics • Replay-­ability  from  the  Data  Lake • Server-­less  ETL  Capabilities
  • 24. Amazon  Kinesis  Firehose 1. Easy  to  use  with  Agent 2. Automatic  Elasticity 3. Near  Real-­time 4. Simultaneous  Destinations Key  Services  for  Ingestion Amazon  Kinesis  Streams 1. Enables  Custom  Processing 2. Continuous  Data  Collection 3. Real-­time 4. API  Driven  for  Custom  Apps Amazon   Kinesis   Streams Amazon   Kinesis   Firehose
  • 25. Data   Sources Data   Sources Data   Sources Data   Sources Data   Sources S3 DynamoDB Redshift Amazon Kinesis Availability   Zone Availability   Zone Availability   Zone Stream AWS  Lambda KCL  App EMR Elasticsearch
  • 26. Amazon   Glacier Amazon   Kinesis Storage   and   Ingestion Amazon   S3
  • 27. Recommendations • Reminder • Added  Complexity  needs  Business  Justification • Select  the  Right  Tools • Real-­time  Analysis:  Apache  Spark  Streaming,  Storm,  Flink • Firehose  to  Redshift  for  BI  and  Dashboards • Tips • AWS  Lambda  for  ETL  Transformation • Persist  Streams  into  S3
  • 30. Catalogue  and  Search Storage  and   Ingestion Catalogue  and   Search Security API  and  UI
  • 31. Requirements  for  Catalogue  and  Search • Metadata  Index • Automated  Metadata  Processing • Discovery  and  Search • Data  Classification • Server-­less  and  Event-­driven
  • 32. Key  Services  for  Catalogue  and  Search 1. Server-­less 2. Event  Driven 3. Auto  Scaling 4. Real-­time 1. NoSQL 2. Streams 3. Logstash Plugin 1. Deploy  Simply 2. Easy  Admin 3. Kibana Amazon   Elasticsearch Service Amazon DynamoDB AWS Lambda Lambda DynamoDB Elasticsearch
  • 33. Catalogue  and  Search AWS   Lambda Amazon DynamoDB Amazon Elasticsearch
  • 34. Recommendations • Tips • Start  Small  and  Simple… add  Capabilities • File  names,  size,  state,  dates,  tags,  owner • Region,  versions,  lineage,  relationships • Search  Metadata  and  Object  Content • Events • S3  Triggers  Lambda • DynamoDB Streams • Logstash Plugin  to  Elasticsearch
  • 37. Security Storage  and   Ingestion Catalogue  and   Search Security API  and  UI
  • 38. Requirements  for  Security • Data  Encryption  at  Rest • Authentication • Authorisation
  • 39. AWS  IAM 1. Users  and  Roles 2. Identity  Federation 3. Multi  Factor  Authentication 4. Granular  Permissions Key  Services  for  Security AWS  KMS 1. Seamless  Service  Integration 2. Extensive  Compliance AWS   IAM AWS   KMS AWS CloudHSM SSE-­S3
  • 41. Recommendations • Start  Early • Security  Needs  Practice! • Federate  with  your  Corporate  Directory • Best  Practice • Use  CloudTrail and  CloudWatch • Encrypt  Where  Possible • Select  Bucket  Region  for  Regulatory  Compliance • Tips • IAM  Policies,  S3  Versioning  and  MFA  Delete • Lambda  for  Data  Masking
  • 42. API  and  UI Storage  and   Ingestion Catalogue  and   Search Security API  and  UI
  • 43. Requirements  for  API  and  UI • Serve  Data  and  Capabilities  to  Customers • Programmatically • Search  Catalogue • Run  Compute • Extend  Access  Control  Management • And…  Use  of  Familiar  Visualisation  Tools
  • 44. Amazon  API  Gateway 1. Performance  at  Any  Scale 2. Create  RESTful  Frontend 3. Managed  API  Lifecycle Key  Services  for  API  and  UI AWS  Lambda 1. Enables  Server-­less  API 2. Custom  Logic  for  Services   3. Automatic  Scaling AWS Lambda Amazon  API   Gateway
  • 45. API   and   UI Amazon   API  Gateway AWS   Lambda
  • 46. Recommendations • Tips • Go  Server-­less! • Extend  Existing  AWS  Services  and  Build  Custom  Logic • Data  Management,  Processing  and  Transformations • API  Gateway  for  Data  Access • Serve  the  Data,  Search  and  Compute  via  RESTful  APIs • Distribute  a  Custom  SDK • Extend  the  Solution • Build  Advanced  Security  Controls  using  Metadata  Index
  • 47. The  Whole  Picture… Storage  and   Ingestion Catalogue  and   Search Security API  and  UI Storage  and   Ingestion Catalogue  and   Search Security API  and  UI
  • 48. Amazon   EMR Amazon   RDS Amazon   S3 Amazon   Glacier Amazon   Kinesis Storage   and Ingestion Security AWS   KMS AWS   IAM API And UI Amazon API  Gateway AWS   Lambda USERS Amazon   Redshift Catalogue  and  Search AWS   Lambda Amazon DynamoDB Amazon Elasticsearch
  • 49. A  Data  Lake  is… • Foundation  of  Data  Storage  and  Streaming  Data • Metadata  index  to  help  Categorise  and  Govern   • Search  Index  to  Enable  Data  Discovery • Robust  Set  of  Security  Controls • Governance  Through  Technology  Not  Policy • Interface  to  Expose  Data  and  Capabilities  to  Users
  • 50. ©  2016,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. 2016-­04-­28 Demo
  • 51. Demo
  • 52. Building  Catalogue  and  Search ElasticSearch Metadata Index LambdaS3  Bucket Logstash Data  Flow Data Source DynamoDB
  • 55. Next  Steps • How  to  Get  Started • AWS  Documentation • Getting  Started  Guide • AWS  Training  &  Certification • Big  Data  on  AWS • AWS  Partner  Network • AWS  Professional  Services • Big  Data  Specialists
  • 56. AWS  Training  &  Certification Intro  Videos  &  Labs   Free  videos  and  labs  to   help  you  learn  to  work   with  30+  AWS  services   – in  minutes! Training  Classes In-­person  and  online   courses  to  build   technical  skills  – taught  by  accredited   AWS  instructors Online  Labs   Practice  working  with   AWS  services  in  live   environment  – Learn  how  related   services  work   together AWS  Certification Validate  technical   skills  and  expertise  – identify  qualified  IT   talent  or  show  you   are  AWS  cloud  ready Learn  more:  aws.amazon.com/training
  • 57. Your  Training  Next  Steps: ü Visit  the  AWS  Training  &  Certification  pod  to  discuss  your   training  plan  &  AWS  Summit  training  offer ü Register  &  attend  AWS  instructor  led  training ü Get  Certified AWS  Certified?  Visit  the  AWS  Summit  Certification  Lounge  to  pick  up  your  swag Learn  more:  aws.amazon.com/training