SlideShare a Scribd company logo
1 of 28
Download to read offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Building an End-to-End Serverless
Data Analytics Solution on AWS
G o w r i B a l a s u b r a m a n i a n , A W S S o l u t i o n A r c h i t e c t
K a r t h i k K u m a r O d a p a l l y , A W S S o l u t i o n A r c h i t e c t
R a j e e v S r i n i v a s a n , A W S S o l u t i o n A r c h i t e c t
R u d y C h e t t y , A W S S o l u t i o n A r c h i t e c t
A B D 3 1 3
N o v e m b e r 2 7 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Presentation
 Service Introduction
 Reference Architecture
 Query Performance Best Practices
 Workshop Overview
• Hands on Workshop
 Lab1: Serverless Analysis of Data in Amazon Simple Storage (Amazon S3) using Amazon Athena
 Lab2: Visualization Using Amazon QuickSight
 Lab3: Serverless ETL and Data Discovery Using Amazon Glue [Optional]
 Lab4: Analysis of Data in Amazon S3 Using Amazon Redshift Spectrum [Take Home]
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SERVICE INTRODUCTION
A M A Z O N A T H E N A A M A Z O N Q U I C K S I G H T A W S G L U E
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena
Start Querying Instantly
Serverless. No ETL.
Pay per Query
Only pay for data scanned.
Open. Powerful. Standard.
Built on Presto. Runs standard SQL.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon QuickSight
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon QuickSight—Data Sources
S P I C E
S P I C E
Amazon
Athena
Amazon
S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon QuickSight—Data Sources
Amazon
Redshift
Redshift
Spectrum
S P I C E
S P I C E
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue
Integrated
Data Catalog
Automated
Data Discovery
Code
Generation
Developer
Endpoints
Flexible
Job Scheduler
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue—Components
 Hive Metastore compatible with enhanced functionality
 Crawlers automatically extracts metadata and create tables
 Integrated with Amazon Athena, Amazon Redshift Spectrum
 Run jobs on a serverless Spark platform
 Provides flexible scheduling
 Handles dependency resolution, monitoring, and alerting
 Auto-generates ETL code
 Build on open frameworks—Python and Spark
 Developer-centric—editing, debugging, sharing
Data Catalog
Job Authoring
Job Execution
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Reference—Amazon QuickSight
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Reference—Amazon Athena
One of the big attractions of Amazon
Athena is that it’s serverless and purely
consumption based. We only pay when
we’re actually querying the data, and we
don’t have to keep a cluster running all the
time.
–Matt Chesler,
Director of DevOps, Movable Ink
”
“
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reference Architecture
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
On-Premises Analytics Pipeline
Always OnStatic : Not Scalable Outages Impact Storage Compute
Database
User
Reporting
IOT Devices
Application
Server Logs
Data Source
On-Premise Hadoop
Cluster
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reference Architecture
Landing
Amazon S3 Bucket
Amazon Athena
Query data
Amazon QuickSight
Visualization
Kinesis Firehose
Real-Time Data
Collection
Data Export
AWS DMS
Log Aggregation
AWS Service Logs
Web Application Logs
Server Logs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reference Architecture—ETL
Landing
Amazon S3 Bucket
Amazon Athena
Glue Data
Catalog
Query data
Amazon S3
Glue ETL Glue Crawler
Amazon QuickSight
Visualization
Kinesis Firehose
Real-Time Data
Collection
Data Export
AWS DMS
Log Aggregation
AWS Service Logs
Web Application Logs
Server Logs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reference Architecture
Amazon
QuickSight
4. Visualize all your data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Query Performance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Best Practices—Storage
Partition your data
Optimize columnar data store generation
Compress and split files
Optimize file size
SELECT count(*) as count FROM taxi_rides_csv
(Run time: 20.06 seconds, Data scanned: 207.54GB, Row Count: 1,310,911,060)
SELECT count(*) as count FROM taxi_rides_parquet
(Run time: 5.76 seconds, Data scanned: 0KB, Row Count: 2,870,781,820)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Best Practices—Query
 Optimize ORDER BY
 Optimize joins
 Optimize GROUP BY
 Optimize the LIKE operator
SELECT * FROM nytaxirides WHERE year = 2011 AND month = 5 AND type = 'yellow’ ORDER BY ratecode
(Run time: 3 minutes 6 seconds)
SELECT * FROM nytaxirides WHERE year = 2011 AND month = 5 AND type = 'yellow’ ORDER BY ratecode LIMIT 1000
(Run time: 3.01 seconds)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Best Practices—Query
 Use approximate functions
 Only include the columns that you need
SELECT count(distinct tpep_pickup_datetime) FROM nytaxidata
(Run time: 30.82 seconds)
SELECT approx_distinct(tpep_pickup_datetime) FROM nytaxidata
(Run time: 25.21 seconds)
SELECT * FROM nytaxirides WHERE year = 2011 AND type = 'yellow' AND month = 5
(Run time: 2 minutes 59 seconds, Data scanned: 382.88MB)
SELECT vendorid, ratecode, passenger_count FROM nytaxirides WHERE year = 2011 AND type = 'yellow' AND month = 5
(Run time: 38.79 seconds, Data scanned: 10.06MB)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hands-On Workshop
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lab 1: Serverless Analysis Using Amazon Athena
Amazon S3 Bucket
(CSV Format)
Amazon Athena
Amazon S3 Bucket
(Parquet Format)
Athena Data
Catalog
Metadata
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lab 2: Visualization Using Amazon QuickSight
Amazon S3 Bucket
(CSV Format)
Amazon Athena
Amazon S3 Bucket
(Parquet Format)
Visualization
Amazon QuickSight
Athena Data
Catalog
Metadata
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lab 3: ELT and Data Discovery Using AWS Glue
Amazon S3 Bucket
(CSV Format)
Amazon Athena
Your S3 Bucket
(Parquet Format)
AWS Glue ETL
CSV to Parquet
AWS Glue Crawler
AWS Glue Crawler
Metadata
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lab 4: Analysis Using Redshift Spectrum
Amazon S3 Bucket
(CSV Format)
Amazon Redshift Spectrum
Amazon S3 Bucket
(Parquet Format)
AWS Glue Crawler
AWS Glue Crawler
Metadata
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analysis & Visualization Pipeline on AWS
Generate Collect Store
Extract
Transform
Load
Analyze Visualize/Report
IOT Devices
Application
Server Logs
Kinesis Stream
Kinesis Firehose
Polling Application AWS Glue
Amazon Redshift &
Redshift Spectrum
Amazon Athena
Kinesis Analytics
Amazon QuickSight
Amazon S3
Amazon RDS
Database on EC2 Amazon EMR
Lab2
Lab1
Lab3
Lab4
AWS Lambda
Kinesis Enabled
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Workshop
• Please collect the credit coupon. You can apply this coupon towards completing the labs in this workshop.
• Create an AWS Account, if you don’t have one. Please do not use your production account for the labs.
• Provide your AWS Account ID for whitelisting to any of the AWS personnel who are staffing the workshop.
Choose Support on the navigation bar on the upper right, and then choose Support Center. Your currently
signed-in account ID appears in the upper-right corner below the Support menu.
• Navigate to the following web link for workshop lab instruction
http://bit.ly/2jgx6vd
• Choose Oregon region for the labs.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
Please complete your survey

More Related Content

More from Amazon Web Services

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAmazon Web Services
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightAmazon Web Services
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotAmazon Web Services
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Amazon Web Services
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?Amazon Web Services
 
Protect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksProtect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksAmazon Web Services
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Amazon Web Services
 

More from Amazon Web Services (20)

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei server
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSight
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
 
Protect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksProtect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced Attacks
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
 

ABD313-Building an End-to-End Serverless Data Analytics Solution on AWS

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT Building an End-to-End Serverless Data Analytics Solution on AWS G o w r i B a l a s u b r a m a n i a n , A W S S o l u t i o n A r c h i t e c t K a r t h i k K u m a r O d a p a l l y , A W S S o l u t i o n A r c h i t e c t R a j e e v S r i n i v a s a n , A W S S o l u t i o n A r c h i t e c t R u d y C h e t t y , A W S S o l u t i o n A r c h i t e c t A B D 3 1 3 N o v e m b e r 2 7 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • Presentation  Service Introduction  Reference Architecture  Query Performance Best Practices  Workshop Overview • Hands on Workshop  Lab1: Serverless Analysis of Data in Amazon Simple Storage (Amazon S3) using Amazon Athena  Lab2: Visualization Using Amazon QuickSight  Lab3: Serverless ETL and Data Discovery Using Amazon Glue [Optional]  Lab4: Analysis of Data in Amazon S3 Using Amazon Redshift Spectrum [Take Home]
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SERVICE INTRODUCTION A M A Z O N A T H E N A A M A Z O N Q U I C K S I G H T A W S G L U E
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Start Querying Instantly Serverless. No ETL. Pay per Query Only pay for data scanned. Open. Powerful. Standard. Built on Presto. Runs standard SQL.
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon QuickSight
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon QuickSight—Data Sources S P I C E S P I C E Amazon Athena Amazon S3
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon QuickSight—Data Sources Amazon Redshift Redshift Spectrum S P I C E S P I C E
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue Integrated Data Catalog Automated Data Discovery Code Generation Developer Endpoints Flexible Job Scheduler
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue—Components  Hive Metastore compatible with enhanced functionality  Crawlers automatically extracts metadata and create tables  Integrated with Amazon Athena, Amazon Redshift Spectrum  Run jobs on a serverless Spark platform  Provides flexible scheduling  Handles dependency resolution, monitoring, and alerting  Auto-generates ETL code  Build on open frameworks—Python and Spark  Developer-centric—editing, debugging, sharing Data Catalog Job Authoring Job Execution
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customer Reference—Amazon QuickSight
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customer Reference—Amazon Athena One of the big attractions of Amazon Athena is that it’s serverless and purely consumption based. We only pay when we’re actually querying the data, and we don’t have to keep a cluster running all the time. –Matt Chesler, Director of DevOps, Movable Ink ” “
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Architecture
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. On-Premises Analytics Pipeline Always OnStatic : Not Scalable Outages Impact Storage Compute Database User Reporting IOT Devices Application Server Logs Data Source On-Premise Hadoop Cluster
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Architecture Landing Amazon S3 Bucket Amazon Athena Query data Amazon QuickSight Visualization Kinesis Firehose Real-Time Data Collection Data Export AWS DMS Log Aggregation AWS Service Logs Web Application Logs Server Logs
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Architecture—ETL Landing Amazon S3 Bucket Amazon Athena Glue Data Catalog Query data Amazon S3 Glue ETL Glue Crawler Amazon QuickSight Visualization Kinesis Firehose Real-Time Data Collection Data Export AWS DMS Log Aggregation AWS Service Logs Web Application Logs Server Logs
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reference Architecture Amazon QuickSight 4. Visualize all your data
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Query Performance
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Best Practices—Storage Partition your data Optimize columnar data store generation Compress and split files Optimize file size SELECT count(*) as count FROM taxi_rides_csv (Run time: 20.06 seconds, Data scanned: 207.54GB, Row Count: 1,310,911,060) SELECT count(*) as count FROM taxi_rides_parquet (Run time: 5.76 seconds, Data scanned: 0KB, Row Count: 2,870,781,820)
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Best Practices—Query  Optimize ORDER BY  Optimize joins  Optimize GROUP BY  Optimize the LIKE operator SELECT * FROM nytaxirides WHERE year = 2011 AND month = 5 AND type = 'yellow’ ORDER BY ratecode (Run time: 3 minutes 6 seconds) SELECT * FROM nytaxirides WHERE year = 2011 AND month = 5 AND type = 'yellow’ ORDER BY ratecode LIMIT 1000 (Run time: 3.01 seconds)
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Best Practices—Query  Use approximate functions  Only include the columns that you need SELECT count(distinct tpep_pickup_datetime) FROM nytaxidata (Run time: 30.82 seconds) SELECT approx_distinct(tpep_pickup_datetime) FROM nytaxidata (Run time: 25.21 seconds) SELECT * FROM nytaxirides WHERE year = 2011 AND type = 'yellow' AND month = 5 (Run time: 2 minutes 59 seconds, Data scanned: 382.88MB) SELECT vendorid, ratecode, passenger_count FROM nytaxirides WHERE year = 2011 AND type = 'yellow' AND month = 5 (Run time: 38.79 seconds, Data scanned: 10.06MB)
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hands-On Workshop
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lab 1: Serverless Analysis Using Amazon Athena Amazon S3 Bucket (CSV Format) Amazon Athena Amazon S3 Bucket (Parquet Format) Athena Data Catalog Metadata
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lab 2: Visualization Using Amazon QuickSight Amazon S3 Bucket (CSV Format) Amazon Athena Amazon S3 Bucket (Parquet Format) Visualization Amazon QuickSight Athena Data Catalog Metadata
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lab 3: ELT and Data Discovery Using AWS Glue Amazon S3 Bucket (CSV Format) Amazon Athena Your S3 Bucket (Parquet Format) AWS Glue ETL CSV to Parquet AWS Glue Crawler AWS Glue Crawler Metadata
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lab 4: Analysis Using Redshift Spectrum Amazon S3 Bucket (CSV Format) Amazon Redshift Spectrum Amazon S3 Bucket (Parquet Format) AWS Glue Crawler AWS Glue Crawler Metadata
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analysis & Visualization Pipeline on AWS Generate Collect Store Extract Transform Load Analyze Visualize/Report IOT Devices Application Server Logs Kinesis Stream Kinesis Firehose Polling Application AWS Glue Amazon Redshift & Redshift Spectrum Amazon Athena Kinesis Analytics Amazon QuickSight Amazon S3 Amazon RDS Database on EC2 Amazon EMR Lab2 Lab1 Lab3 Lab4 AWS Lambda Kinesis Enabled
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Workshop • Please collect the credit coupon. You can apply this coupon towards completing the labs in this workshop. • Create an AWS Account, if you don’t have one. Please do not use your production account for the labs. • Provide your AWS Account ID for whitelisting to any of the AWS personnel who are staffing the workshop. Choose Support on the navigation bar on the upper right, and then choose Support Center. Your currently signed-in account ID appears in the upper-right corner below the Support menu. • Navigate to the following web link for workshop lab instruction http://bit.ly/2jgx6vd • Choose Oregon region for the labs.
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! Please complete your survey