SlideShare a Scribd company logo
1 of 58
Getting Started with Big Data and
HPC in the Cloud
KD Singh
AWS Solutions Architect
kdsingh@amazon.com
BIG DATA :
Big Data Challenges:
Capacity
Planning &
Scalability
Lower Cost,
OpEx
Experiment &
learn more
Trad.
DWHs
IT
Complexity
Data
Variety…
..Volume,
velocity
Old Answers &
Questions
Managed
Services
Fully managed,
secured &
automated
services that
brings agility &
focus
S3, EMR,
Kinesis,
DynamoDB:
Collect all data, do
Complex
computations and
processing it, both
in Real-Time &
Batch
Sensors
(IoT)
Social
Images
Videos
E. Apps.
Documents
Web Logs
Big Value
Redshift:
Super Fast, MPP,
Petabyte Ready,
analytical Data
Warehouse
available in
minutes
Virtually
unlimited &
Elastic
Resources
No heavy lifting &
Reduced Time to
Market, parallel
processing on
demand
New
Answers/questions
& Business Ideas
Extract the meaning
from all your data &
focus on new
business Ideas,
Models, etc..
High Cost &
Commitment
Plethora of Tools
Glacier
S3 DynamoDB
RDS
EMR
Redshift
Data Pipeline
Kinesis
Cassandra
CloudSearch
Ingest Store Analyze Visualize
Data Answers
Time
Simplify Data Analytics Flow
Multiple stages
Storage decoupled from processing
Glacier
S3
DynamoDB
RDS
Kinesis
Spark
Streaming
EMR
Ingest Store Process/Analyze Visualize
Data Pipeline
Storm
Kafka
Redshift
Cassandra
CloudSearch
Kinesis
Connector
Kinesis
enabled app
App Server
Web Server
Devices
AWS Big Data Portfolio
Collect / Ingest
Kinesis
Process / Analyze
EMR EC2
Redshift Data Pipeline
Visualize / ReportStore
Glacier
S3
DynamoDB
RDS
Import Export
Direct Connect
Amazon SQS
Industries using AWS for data analysis
Mobile / Cable
Telecom
Oil and Gas
Industrial
Manufacturing
Retail/Consumer
Entertainment
Hospitality
Life Sciences
Scientific
Exploration
Financial
Services
Publishing Media
Advertising
Online Media
Social Network
Gaming
Ingest: The act of collecting and storing data
Types of Data Ingest
• Transactional
– Database reads/writes
• File
– Media files, log files
• Stream
– Click-stream logs (sets of
events)
Database
Cloud
Storage
Stream
Storage
LoggingFrameworksDevicesApps
Real-time processing of streaming data
High throughput
Elastic
Easy to use
Connectors for EMR, S3, Redshift,
DynamoDB
Amazon
Kinesis
Kinesis Stream:
Managed ability to capture and store data
• Streams are made of Shards
• Each Shard ingests data up to
1MB/sec, and up to 1000 TPS
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours
• Scale Kinesis streams by adding or
removing Shards
• Replay data inside of 24Hr. Window
• Kinesis Client Library (KCL) - Java
client library, available on Github
Sending and Reading Data from Kinesis Streams
HTTP Post
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Kinesis Client Library
+
Connector Library
Apache Storm
Amazon Elastic
MapReduce
Sending Reading
Write Read
AWS Partners for Ingest, Data Load and Transformation
Hparser, Big Data Edition
Flume, Sqoop
Storage
Cloud Database and Storage Tier Anti-pattern
App/Web Tier
Client Tier
Database & Storage Tier
Cloud Database and Storage Tier — Use the Right Tool for the Job!
App/Web Tier
Client Tier
Data Tier
Database & Storage Tier
Search
Hadoop/HDFS
Cache
Blob Store
SQL NoSQL
Database & Storage Tier
Amazon RDSAmazon
DynamoDB
Amazon
ElastiCache
Amazon S3
Amazon
Glacier
Amazon
CloudSearch
HDFS on Amazon EMR
Cloud Database and Storage Tier — Use the Right Tool for the Job!
Structured – Simple Query
NoSQL
Amazon DynamoDB
Cache
Amazon ElastiCache
(Memcached, Redis)
Structured – Complex Query
SQL
Amazon RDS
Data Warehouse
Amazon Redshift
Search
Amazon CloudSearch
Unstructured – No Query
Cloud Storage
Amazon S3
Amazon Glacier
Unstructured – Custom Query
Hadoop/HDFS
Amazon Elastic Map Reduce
DataStructureComplexity
Query Structure Complexity
What Database and Storage Should I Use?
What Data Store Should I Use?
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
CloudSearch
Amazon
EMR (HDFS)
Amazon S3 Amazon Glacier
Average
latency
ms ms ms, sec ms,sec sec,min,h
rs
ms,sec,min
(~ size)
hrs
Data volume GB GB–TBs
(no limit)
GB–TB
(3 TB Max)
GB–TB GB–PB
(~nodes)
GB–PB
(no limit)
GB–PB
(no limit)
Item size B-KB KB
(64 KB max)
KB
(~rowsize)
KB
(1 MB max)
MB-GB KB-GB
(5 TB max)
GB
(40 TB max)
Request rate Very High Very High High High Low –
Very High
Low–
Very High
(no limit)
Very Low
(no limit)
Storage cost
$/GB/month
$$ ¢¢ ¢¢ $ ¢ ¢ ¢
Durability Low -
Moderate
Very High High High High Very High Very High
Hot Data Warm Data Cold Data
Store anything
Object storage
Scalable
Designed for 99.999999999% durability
Amazon
S3
Aggregate All Data in S3 Surrounded by a collection of the right tools
EMR Kinesis
Redshift DynamoDB RDS
Data Pipeline
Spark
Streaming
Cassandra Storm
S3
• No limit on the number of objects
• Object size up to 5TB
• Central data storage for all
systems
• High bandwidth
• 99.999999999% durability
• Versioning, Lifecycle Policies
• Glacier Integration
Fully-managed NoSQL database service
Built on solid-state drives (SSDs)
Consistent low latency performance
Any throughput rate
No storage limits
Amazon
DynamoDB
DynamoDB: Managed High Availability and Durability
• Scaling without down-time
• Automatic sharding
• Security inspections, patches,
upgrades
• Automatic hardware failover
• Multi-AZ replication
• Hardware configuration
designed specifically for
DynamoDB
• Performance tuning
Relational Databases
Fully managed; zero admin
MySQL, PostgreSQL, Oracle, SQL Server
and Aurora
Amazon
RDS
Process and Analyze
Characterizing HPC
Embarrassingly parallel
Elastic
Batch workloads
Loosely
Coupled
Interconnected jobs
Network sensitivity
Job specific algorithms
Tightly
Coupled
Data management
Task distribution
Workflow management
Supporting
Services
EC2 instance types
Auto Scaling
CLI, User Data
SQL, NoSQL, Object Stores
Simple Queuing Service
Simple Workflow Service
Implement MPI
GPU Programming
Tightly coupled
Compute and Memory optimized instances
Implement HVM process execution
Intel® Xeon® processors
10 Gigabit Ethernet with Enhanced Networking, SR-IOV
c4.8xlarge
36 vCPUs
2.9 GHz Intel Xeon
E5-2666v3 Haswell
60 GB RAM
4000 Mbps dedicated
EBS throughput
r3.8xlarge
32 vCPUs
2.8 GHz Intel Xeon
E5-2670v2 Ivy Bridge
244GB RAM
2 x 320 GB
Local SSD
GPU Processing
CG1 instances
Intel® Xeon® X5570 processors
2 x NVIDIA Tesla M2050 GPUs
CUDA, OpenCL
G2 instances
Intel® Xeon® E5-2670 processors
4 x NVIDIA GRID K520 GPUs
Each GPU with 1536 CUDA cores and 4GB
of video memory
DirectX, OpenGL
Network Placement Groups
Cluster instances deployed in a Placement Group enjoy
low latency, full bisection 10 Gbps bandwidth
Connect multiple Placement Groups to create very large
clusters
Enhanced networking with SR-IOV provides higher
packet per second (PPS) performance, lower inter-
instance latencies, and very low network jitter
10Gbps
Low cost with flexible pricing Efficient clusters
Unlimited infrastructure
Faster time to results
Concurrent Clusters on-demand
Increased collaboration
Why AWS for HPC?
Customers running HPC workloads on AWS
Popular HPC workloads on AWS
Genome
processing
Modeling and
Simulation
Government and
Educational Research
Monte Carlo
Simulations
Transcoding and
Encoding
Computational
Chemistry
Across several key industry verticals
Utilities Biopharma Materials Design
Manufacturing Academic
Research
Auto & Aerospace
TOP500: 76th fastest supercomputer on-demand
Jun 2014 Top 500 list
484.2 TFlop/s
26,496 cores in a cluster of
EC2 C3 instances
Intel Xeon E5-2680v2 10C
2.800GHzprocessors
LinPack Benchmark
Three types of data-driven development
Retrospective/
Batch
analysis and
reporting
Here-and-now/
Stream
real-time processing
and dashboards
Predictions
to enable smart
applications
Amazon Kinesis
Amazon EC2
AWS Lambda
Amazon Redshift
Amazon RDS
Amazon EMR
Amazon Machine
Learning
Columnar data warehouse
ANSI SQL compatible
Massively parallel
Petabyte scale
Fully-managed
Very cost-effective
Amazon
Redshift
Amazon Redshift architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via
Amazon S3
– Parallel load from Amazon DynamoDB
• Hardware optimized for data processing
• Two hardware platforms
– DW1: HDD; scale from 2TB to 1.6PB
– DW2: SSD; scale from 160GB to 256TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With row storage you do
unnecessary I/O
• To get total amount, you have to
read everything
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• With column storage, you only
read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Columnar compression saves
space & reduces I/O
• Amazon Redshift analyzes and
compresses your data
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Direct-attached storage
• Large data block sizes
• Track of the minimum and
maximum value for each
block
• Skip over blocks that don’t
contain the data needed for
a given query
• Minimize unnecessary I/O
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Use direct-attached storage to
maximize throughput
• Hardware optimized for high
performance data processing
• Large block sizes to make the
most of each read
• Amazon Redshift manages
durability for you
Hadoop/HDFS clusters
Hive, Pig, Impala, HBase
Easy to use; fully managed
On-demand and spot pricing
Tight integration with S3,
DynamoDB, and Kinesis
Amazon
Elastic
MapReduce
EMR Cluster
S3
1. Put the data
into S3
2. Choose: Hadoop distribution, #
of nodes, types of nodes, Hadoop
apps like Hive/Pig/HBase
4. Get the output
from S3
3. Launch the cluster using
the EMR console, CLI, SDK, or
APIs
How Does EMR Work?
EMR
EMR Cluster
S3
You can easily resize the
cluster
And launch parallel
clusters using the same
data
How Does EMR Work?
EMR
EMR Cluster
S3
Use Spot
nodes to save
time and
money
How Does EMR Work?
The Hadoop Ecosystem works inside of EMR
Easy to use, managed machine learning
service built for developers
Robust, powerful machine learning
technology based on Amazon’s internal
systems
Create models using your data already
stored in the AWS cloud
Deploy models to production in seconds
Amazon
Machine
Learning
Three Supported Types of Predictions
• Binary Classification
– Predict the answer to a Yes/No question
• Multi-class classification
– Predict the correct category from a list
• Regression
– Predict the value of a numeric variable
Partners – Advances Analytics (Scientific, algorithmic, predictive, etc)
Visualize
AWS Partners for BI & Data Visualization
Putting it all together
Spark
Streaming,
Apache
Storm,
Native Amazon
Redshift
Spark, Impala,
Presto
Hive
Amazon
Redshift
Hive
Spark,
Presto
Amazon
Kinesis/
Kafka
Amazon
DynamoDB
Amazon S3Data
Hot ColdData Temperature
QueryLatency
Low
High
HDFS
Hive
Native
Client
Data Temperature vs Query Latency
Real-Time Batch
Answers
Amazon EMR
as ETL Grid
and Analysis
Amazon Redshift –
Production DWH
VisualizationLogs
Traffic Statistics
Demo
Video Recording of Demo
https://aws.amazon.com/big-data/
https://aws.amazon.com/testdrive/bigdata/
https://aws.amazon.com/training/course-
descriptions/bigdata/
https://aws.amazon.com/hpc/
Thank You!

More Related Content

What's hot

Database Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationDatabase Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationAmazon Web Services
 
AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...
AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...
AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...Amazon Web Services
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSAmazon Web Services
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksAmazon Web Services
 
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...Amazon Web Services
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Amazon Web Services
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSAmazon Web Services
 
Building and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECSBuilding and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECSAmazon Web Services
 
Building Big Data Applications on AWS
Building Big Data Applications on AWSBuilding Big Data Applications on AWS
Building Big Data Applications on AWSAmazon Web Services
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...Amazon Web Services
 
HSBC and AWS Day - Big Data and HPC on AWS
HSBC and AWS Day - Big Data and HPC on AWSHSBC and AWS Day - Big Data and HPC on AWS
HSBC and AWS Day - Big Data and HPC on AWSAmazon Web Services
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...Amazon Web Services
 
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...Amazon Web Services
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWSAmazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Amazon Web Services
 

What's hot (20)

Database Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationDatabase Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform Migration
 
AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...
AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...
AWS re:Invent 2016: Case Study: How Monsanto Uses Amazon EFS with Their Large...
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
 
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
ENT313 Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum E...
 
HPC in AWS - Technical Workshop
HPC in AWS - Technical WorkshopHPC in AWS - Technical Workshop
HPC in AWS - Technical Workshop
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
 
Cost Optimization at Scale
Cost Optimization at ScaleCost Optimization at Scale
Cost Optimization at Scale
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Building and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECSBuilding and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECS
 
Building Big Data Applications on AWS
Building Big Data Applications on AWSBuilding Big Data Applications on AWS
Building Big Data Applications on AWS
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
 
HSBC and AWS Day - Big Data and HPC on AWS
HSBC and AWS Day - Big Data and HPC on AWSHSBC and AWS Day - Big Data and HPC on AWS
HSBC and AWS Day - Big Data and HPC on AWS
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
 
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
 

Viewers also liked

(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudAmazon Web Services
 
TopQuants2014_JokTang_DenysSemagin
TopQuants2014_JokTang_DenysSemaginTopQuants2014_JokTang_DenysSemagin
TopQuants2014_JokTang_DenysSemaginDenys Semagin
 
(NET302) Delivering a DBaaS Using Advanced AWS Networking
(NET302) Delivering a DBaaS Using Advanced AWS Networking(NET302) Delivering a DBaaS Using Advanced AWS Networking
(NET302) Delivering a DBaaS Using Advanced AWS NetworkingAmazon Web Services
 
(SEC325) Satisfy PCI Obligations While Continuing to Innovate
(SEC325) Satisfy PCI Obligations While Continuing to Innovate(SEC325) Satisfy PCI Obligations While Continuing to Innovate
(SEC325) Satisfy PCI Obligations While Continuing to InnovateAmazon Web Services
 
(DEV309) Large-Scale Metrics Analysis in Ruby
(DEV309) Large-Scale Metrics Analysis in Ruby(DEV309) Large-Scale Metrics Analysis in Ruby
(DEV309) Large-Scale Metrics Analysis in RubyAmazon Web Services
 
Deep Dive: Infrastructure as Code
Deep Dive: Infrastructure as CodeDeep Dive: Infrastructure as Code
Deep Dive: Infrastructure as CodeAmazon Web Services
 
AWS Seminar Series 2015 Brisbane
AWS Seminar Series 2015 BrisbaneAWS Seminar Series 2015 Brisbane
AWS Seminar Series 2015 BrisbaneAmazon Web Services
 
Architecting Hybrid Infrastructure
Architecting Hybrid InfrastructureArchitecting Hybrid Infrastructure
Architecting Hybrid InfrastructureAmazon Web Services
 
(DVO303) Scaling Infrastructure Operations with AWS
(DVO303) Scaling Infrastructure Operations with AWS(DVO303) Scaling Infrastructure Operations with AWS
(DVO303) Scaling Infrastructure Operations with AWSAmazon Web Services
 
(CMP302) Amazon ECS: Distributed Applications at Scale
(CMP302) Amazon ECS: Distributed Applications at Scale(CMP302) Amazon ECS: Distributed Applications at Scale
(CMP302) Amazon ECS: Distributed Applications at ScaleAmazon Web Services
 
Account Separation and Mandatory Access Control on AWS
Account Separation and Mandatory Access Control on AWSAccount Separation and Mandatory Access Control on AWS
Account Separation and Mandatory Access Control on AWSAmazon Web Services
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big DataAmazon Web Services
 
Measuring HPC: Performance, Cost, & Value
Measuring HPC: Performance, Cost, & ValueMeasuring HPC: Performance, Cost, & Value
Measuring HPC: Performance, Cost, & Valueinside-BigData.com
 
Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivAmazon Web Services
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
 

Viewers also liked (20)

(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
TopQuants2014_JokTang_DenysSemagin
TopQuants2014_JokTang_DenysSemaginTopQuants2014_JokTang_DenysSemagin
TopQuants2014_JokTang_DenysSemagin
 
Welcome enterprise summit
Welcome enterprise summitWelcome enterprise summit
Welcome enterprise summit
 
(NET302) Delivering a DBaaS Using Advanced AWS Networking
(NET302) Delivering a DBaaS Using Advanced AWS Networking(NET302) Delivering a DBaaS Using Advanced AWS Networking
(NET302) Delivering a DBaaS Using Advanced AWS Networking
 
(SEC325) Satisfy PCI Obligations While Continuing to Innovate
(SEC325) Satisfy PCI Obligations While Continuing to Innovate(SEC325) Satisfy PCI Obligations While Continuing to Innovate
(SEC325) Satisfy PCI Obligations While Continuing to Innovate
 
(DEV309) Large-Scale Metrics Analysis in Ruby
(DEV309) Large-Scale Metrics Analysis in Ruby(DEV309) Large-Scale Metrics Analysis in Ruby
(DEV309) Large-Scale Metrics Analysis in Ruby
 
Deep Dive: Infrastructure as Code
Deep Dive: Infrastructure as CodeDeep Dive: Infrastructure as Code
Deep Dive: Infrastructure as Code
 
AWS Seminar Series 2015 Brisbane
AWS Seminar Series 2015 BrisbaneAWS Seminar Series 2015 Brisbane
AWS Seminar Series 2015 Brisbane
 
Architecting Hybrid Infrastructure
Architecting Hybrid InfrastructureArchitecting Hybrid Infrastructure
Architecting Hybrid Infrastructure
 
(DVO303) Scaling Infrastructure Operations with AWS
(DVO303) Scaling Infrastructure Operations with AWS(DVO303) Scaling Infrastructure Operations with AWS
(DVO303) Scaling Infrastructure Operations with AWS
 
(CMP302) Amazon ECS: Distributed Applications at Scale
(CMP302) Amazon ECS: Distributed Applications at Scale(CMP302) Amazon ECS: Distributed Applications at Scale
(CMP302) Amazon ECS: Distributed Applications at Scale
 
Account Separation and Mandatory Access Control on AWS
Account Separation and Mandatory Access Control on AWSAccount Separation and Mandatory Access Control on AWS
Account Separation and Mandatory Access Control on AWS
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
Bv asw presentation
Bv asw presentationBv asw presentation
Bv asw presentation
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Measuring HPC: Performance, Cost, & Value
Measuring HPC: Performance, Cost, & ValueMeasuring HPC: Performance, Cost, & Value
Measuring HPC: Performance, Cost, & Value
 
Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel Aviv
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
 

Similar to Getting Started with Big Data and HPC in the Cloud - August 2015

Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?Amazon Web Services Korea
 
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web ServicesAWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web ServicesAmazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWSAmazon Web Services
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...Sungmin Kim
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 

Similar to Getting Started with Big Data and HPC in the Cloud - August 2015 (20)

AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
 
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web ServicesAWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Getting Started with Big Data and HPC in the Cloud - August 2015

  • 1. Getting Started with Big Data and HPC in the Cloud KD Singh AWS Solutions Architect kdsingh@amazon.com
  • 2. BIG DATA : Big Data Challenges: Capacity Planning & Scalability Lower Cost, OpEx Experiment & learn more Trad. DWHs IT Complexity Data Variety… ..Volume, velocity Old Answers & Questions Managed Services Fully managed, secured & automated services that brings agility & focus S3, EMR, Kinesis, DynamoDB: Collect all data, do Complex computations and processing it, both in Real-Time & Batch Sensors (IoT) Social Images Videos E. Apps. Documents Web Logs Big Value Redshift: Super Fast, MPP, Petabyte Ready, analytical Data Warehouse available in minutes Virtually unlimited & Elastic Resources No heavy lifting & Reduced Time to Market, parallel processing on demand New Answers/questions & Business Ideas Extract the meaning from all your data & focus on new business Ideas, Models, etc.. High Cost & Commitment
  • 3. Plethora of Tools Glacier S3 DynamoDB RDS EMR Redshift Data Pipeline Kinesis Cassandra CloudSearch
  • 4. Ingest Store Analyze Visualize Data Answers Time Simplify Data Analytics Flow Multiple stages Storage decoupled from processing
  • 5. Glacier S3 DynamoDB RDS Kinesis Spark Streaming EMR Ingest Store Process/Analyze Visualize Data Pipeline Storm Kafka Redshift Cassandra CloudSearch Kinesis Connector Kinesis enabled app App Server Web Server Devices
  • 6. AWS Big Data Portfolio Collect / Ingest Kinesis Process / Analyze EMR EC2 Redshift Data Pipeline Visualize / ReportStore Glacier S3 DynamoDB RDS Import Export Direct Connect Amazon SQS
  • 7. Industries using AWS for data analysis Mobile / Cable Telecom Oil and Gas Industrial Manufacturing Retail/Consumer Entertainment Hospitality Life Sciences Scientific Exploration Financial Services Publishing Media Advertising Online Media Social Network Gaming
  • 8. Ingest: The act of collecting and storing data
  • 9. Types of Data Ingest • Transactional – Database reads/writes • File – Media files, log files • Stream – Click-stream logs (sets of events) Database Cloud Storage Stream Storage LoggingFrameworksDevicesApps
  • 10. Real-time processing of streaming data High throughput Elastic Easy to use Connectors for EMR, S3, Redshift, DynamoDB Amazon Kinesis
  • 11. Kinesis Stream: Managed ability to capture and store data • Streams are made of Shards • Each Shard ingests data up to 1MB/sec, and up to 1000 TPS • Each Shard emits up to 2 MB/sec • All data is stored for 24 hours • Scale Kinesis streams by adding or removing Shards • Replay data inside of 24Hr. Window • Kinesis Client Library (KCL) - Java client library, available on Github
  • 12. Sending and Reading Data from Kinesis Streams HTTP Post AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce Sending Reading Write Read
  • 13. AWS Partners for Ingest, Data Load and Transformation Hparser, Big Data Edition Flume, Sqoop
  • 15. Cloud Database and Storage Tier Anti-pattern App/Web Tier Client Tier Database & Storage Tier
  • 16. Cloud Database and Storage Tier — Use the Right Tool for the Job! App/Web Tier Client Tier Data Tier Database & Storage Tier Search Hadoop/HDFS Cache Blob Store SQL NoSQL
  • 17. Database & Storage Tier Amazon RDSAmazon DynamoDB Amazon ElastiCache Amazon S3 Amazon Glacier Amazon CloudSearch HDFS on Amazon EMR Cloud Database and Storage Tier — Use the Right Tool for the Job!
  • 18. Structured – Simple Query NoSQL Amazon DynamoDB Cache Amazon ElastiCache (Memcached, Redis) Structured – Complex Query SQL Amazon RDS Data Warehouse Amazon Redshift Search Amazon CloudSearch Unstructured – No Query Cloud Storage Amazon S3 Amazon Glacier Unstructured – Custom Query Hadoop/HDFS Amazon Elastic Map Reduce DataStructureComplexity Query Structure Complexity What Database and Storage Should I Use?
  • 19. What Data Store Should I Use? Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon CloudSearch Amazon EMR (HDFS) Amazon S3 Amazon Glacier Average latency ms ms ms, sec ms,sec sec,min,h rs ms,sec,min (~ size) hrs Data volume GB GB–TBs (no limit) GB–TB (3 TB Max) GB–TB GB–PB (~nodes) GB–PB (no limit) GB–PB (no limit) Item size B-KB KB (64 KB max) KB (~rowsize) KB (1 MB max) MB-GB KB-GB (5 TB max) GB (40 TB max) Request rate Very High Very High High High Low – Very High Low– Very High (no limit) Very Low (no limit) Storage cost $/GB/month $$ ¢¢ ¢¢ $ ¢ ¢ ¢ Durability Low - Moderate Very High High High High Very High Very High Hot Data Warm Data Cold Data
  • 20. Store anything Object storage Scalable Designed for 99.999999999% durability Amazon S3
  • 21. Aggregate All Data in S3 Surrounded by a collection of the right tools EMR Kinesis Redshift DynamoDB RDS Data Pipeline Spark Streaming Cassandra Storm S3 • No limit on the number of objects • Object size up to 5TB • Central data storage for all systems • High bandwidth • 99.999999999% durability • Versioning, Lifecycle Policies • Glacier Integration
  • 22. Fully-managed NoSQL database service Built on solid-state drives (SSDs) Consistent low latency performance Any throughput rate No storage limits Amazon DynamoDB
  • 23. DynamoDB: Managed High Availability and Durability • Scaling without down-time • Automatic sharding • Security inspections, patches, upgrades • Automatic hardware failover • Multi-AZ replication • Hardware configuration designed specifically for DynamoDB • Performance tuning
  • 24. Relational Databases Fully managed; zero admin MySQL, PostgreSQL, Oracle, SQL Server and Aurora Amazon RDS
  • 26. Characterizing HPC Embarrassingly parallel Elastic Batch workloads Loosely Coupled Interconnected jobs Network sensitivity Job specific algorithms Tightly Coupled Data management Task distribution Workflow management Supporting Services EC2 instance types Auto Scaling CLI, User Data SQL, NoSQL, Object Stores Simple Queuing Service Simple Workflow Service Implement MPI GPU Programming
  • 27. Tightly coupled Compute and Memory optimized instances Implement HVM process execution Intel® Xeon® processors 10 Gigabit Ethernet with Enhanced Networking, SR-IOV c4.8xlarge 36 vCPUs 2.9 GHz Intel Xeon E5-2666v3 Haswell 60 GB RAM 4000 Mbps dedicated EBS throughput r3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2670v2 Ivy Bridge 244GB RAM 2 x 320 GB Local SSD
  • 28. GPU Processing CG1 instances Intel® Xeon® X5570 processors 2 x NVIDIA Tesla M2050 GPUs CUDA, OpenCL G2 instances Intel® Xeon® E5-2670 processors 4 x NVIDIA GRID K520 GPUs Each GPU with 1536 CUDA cores and 4GB of video memory DirectX, OpenGL
  • 29. Network Placement Groups Cluster instances deployed in a Placement Group enjoy low latency, full bisection 10 Gbps bandwidth Connect multiple Placement Groups to create very large clusters Enhanced networking with SR-IOV provides higher packet per second (PPS) performance, lower inter- instance latencies, and very low network jitter 10Gbps
  • 30. Low cost with flexible pricing Efficient clusters Unlimited infrastructure Faster time to results Concurrent Clusters on-demand Increased collaboration Why AWS for HPC?
  • 31. Customers running HPC workloads on AWS
  • 32. Popular HPC workloads on AWS Genome processing Modeling and Simulation Government and Educational Research Monte Carlo Simulations Transcoding and Encoding Computational Chemistry
  • 33. Across several key industry verticals Utilities Biopharma Materials Design Manufacturing Academic Research Auto & Aerospace
  • 34. TOP500: 76th fastest supercomputer on-demand Jun 2014 Top 500 list 484.2 TFlop/s 26,496 cores in a cluster of EC2 C3 instances Intel Xeon E5-2680v2 10C 2.800GHzprocessors LinPack Benchmark
  • 35. Three types of data-driven development Retrospective/ Batch analysis and reporting Here-and-now/ Stream real-time processing and dashboards Predictions to enable smart applications Amazon Kinesis Amazon EC2 AWS Lambda Amazon Redshift Amazon RDS Amazon EMR Amazon Machine Learning
  • 36. Columnar data warehouse ANSI SQL compatible Massively parallel Petabyte scale Fully-managed Very cost-effective Amazon Redshift
  • 37. Amazon Redshift architecture • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB • Hardware optimized for data processing • Two hardware platforms – DW1: HDD; scale from 2TB to 1.6PB – DW2: SSD; scale from 160GB to 256TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 38. Amazon Redshift dramatically reduces I/O • Data compression • Zone maps • Direct-attached storage • Large data block sizes ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 • With row storage you do unnecessary I/O • To get total amount, you have to read everything
  • 39. Amazon Redshift dramatically reduces I/O • Data compression • Zone maps • Direct-attached storage • Large data block sizes • With column storage, you only read the data you need ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375
  • 40. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Zone maps • Direct-attached storage • Large data block sizes • Columnar compression saves space & reduces I/O • Amazon Redshift analyzes and compresses your data analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
  • 41. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Direct-attached storage • Large data block sizes • Track of the minimum and maximum value for each block • Skip over blocks that don’t contain the data needed for a given query • Minimize unnecessary I/O
  • 42. Amazon Redshift dramatically reduces I/O • Column storage • Data compression • Zone maps • Direct-attached storage • Large data block sizes • Use direct-attached storage to maximize throughput • Hardware optimized for high performance data processing • Large block sizes to make the most of each read • Amazon Redshift manages durability for you
  • 43. Hadoop/HDFS clusters Hive, Pig, Impala, HBase Easy to use; fully managed On-demand and spot pricing Tight integration with S3, DynamoDB, and Kinesis Amazon Elastic MapReduce
  • 44. EMR Cluster S3 1. Put the data into S3 2. Choose: Hadoop distribution, # of nodes, types of nodes, Hadoop apps like Hive/Pig/HBase 4. Get the output from S3 3. Launch the cluster using the EMR console, CLI, SDK, or APIs How Does EMR Work?
  • 45. EMR EMR Cluster S3 You can easily resize the cluster And launch parallel clusters using the same data How Does EMR Work?
  • 46. EMR EMR Cluster S3 Use Spot nodes to save time and money How Does EMR Work?
  • 47. The Hadoop Ecosystem works inside of EMR
  • 48. Easy to use, managed machine learning service built for developers Robust, powerful machine learning technology based on Amazon’s internal systems Create models using your data already stored in the AWS cloud Deploy models to production in seconds Amazon Machine Learning
  • 49. Three Supported Types of Predictions • Binary Classification – Predict the answer to a Yes/No question • Multi-class classification – Predict the correct category from a list • Regression – Predict the value of a numeric variable
  • 50. Partners – Advances Analytics (Scientific, algorithmic, predictive, etc)
  • 52. AWS Partners for BI & Data Visualization
  • 53. Putting it all together
  • 54. Spark Streaming, Apache Storm, Native Amazon Redshift Spark, Impala, Presto Hive Amazon Redshift Hive Spark, Presto Amazon Kinesis/ Kafka Amazon DynamoDB Amazon S3Data Hot ColdData Temperature QueryLatency Low High HDFS Hive Native Client Data Temperature vs Query Latency Real-Time Batch Answers
  • 55.
  • 56. Amazon EMR as ETL Grid and Analysis Amazon Redshift – Production DWH VisualizationLogs Traffic Statistics Demo