SlideShare a Scribd company logo
1 of 84
Download to read offline
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Big Data and High Performance Computing
Solutions in the AWS Cloud
Ben Butler, Sr. Mgr. Big Data & HPC Marketing
@bensbutler March 26, 2014
Tell us:
What’s good, what’s not
What you want to see at these events
What you want AWS to deliver for you
Your feedback is very important to us
Big Data HPC
Customer Success
Story
Getting Started on
AWS
What we’ll cover today…
Big Data HPC
Customer Success
Story
Getting Started on
AWS
What we’ll cover today…
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
GB TB
PB
95% of the 1.2 zettabytes of
data in the digital universe is
unstructured
70% of of this is user-
generated content
Unstructured data growth
explosive, with estimates of
compound annual growth
(CAGR) at 62% from 2008 –
2012. Source: IDC
ZB
EB
Big Data: Unconstrained data growth
Lower cost,
higher throughput Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Customer segmentation
Marketing spend optimization
Financial modeling & forecasting
Ad targeting & real time bidding
Clickstream analysis
Fraud detection
Use Cases
Visits, views, clicks, purchases
Source, device, location, time
Latency, throughput, uptime
Likes, shares, friends, follows
Price, frequency
Metrics
Relational
NoSQL
Web servers
Mobile phones
Tablets
3rd party feeds
Sources
Structured
Unstructured
Text
Binary
Near Real-time
Batched
Formats
Reporting
Dashboards
Sentiment
Clustering
Machine Learning
Optimization
Analysis
Lower cost,
higher throughput
Highly
constrained
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generated data
Available for analysis
Data volume
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Elastic and highly scalable
No upfront capital expense
Only pay for what you use
+
+
Available on-demand
+
=
Remove
constraints
Accelerated
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Technologies and techniques for
working productively with data,
at any scale.
Big Data
Big data and AWS cloud computing
Big data Cloud computing
Variety, volume, and velocity
requiring new tools
Variety of compute, storage,
and networking options
Big data and AWS cloud computing
Big data Cloud computing
Potentially massive datasets Massive, virtually unlimited capacity
Big data and AWS cloud computing
Big data Cloud computing
Iterative, experimental style of
data manipulation and analysis
Iterative, experimental style of
infrastructure deployment/usage
Big data and AWS cloud computing
Big data Cloud computing
Frequently not a steady-state
workload; peaks and valleys
At its most efficient with highly
variable workloads
Big data and AWS cloud computing
Big data Cloud computing
Absolute performance not as
critical as “time to results”; shared
resources are a bottleneck
Parallel compute projects allow each
workgroup to have more autonomy,
get faster results
Ease of useLower costs
no capital investment
pay as you go
no subscriptions
only pay for what you use
Ease of useLower costs
programmable
zero admin
easy to
configure
integrate with
existing tools
Ease of useLower costs
One tool to rule them all
Use the right tools
Amazon
S3
Amazon
Kinesis
Amazon
DynamoDB
Amazon
Redshift
Amazon
Elastic
MapReduce
Store anything
Object storage
Scalable
99.999999999% durability
Amazon S3
Real-time processing
High throughput; elastic
Easy to use
EMR, S3, Redshift, DynamoDB
Integrations
Amazon
Kinesis
NoSQL Database
Seamless scalability
Zero admin
Single digit millisecond latency
Amazon
DynamoDB
Relational data warehouse
Massively parallel
Petabyte scale
Fully managed
$1,000/TB/Year
Amazon
Redshift
Hadoop/HDFS clusters
Hive, Pig, Impala, Hbase
Easy to use; fully managed
On-demand and spot pricing
Tight integration with S3,
DynamoDB, and Kinesis
Amazon
Elastic
MapReduce
HDFS
Analytics
languages
Data
management
Amazon
RedShift
Amazon EMR
Amazon
RDS
Amazon S3 Amazon
DynamoDB
Amazon
Kinesis
Sources
SourcesData
Sources
AWS Data Pipeline
Bizo: Digital Ad. Tech Metering with Amazon Kinesis
Continuous Ad
Metrics Extraction
Incremental Ad
Statistics
Computation
Metering Record Archive
Ad Analytics Dashboard
Free steak campaign
Facebook page
Mars exploration ops
Consumer social app
Ticket pricing optimizationSAP & SharePoint Securities Trading Data Archiving
Marketing web site Interactive TV apps Financial markets analytics
Consumer social app Big data analytics
Web site & media sharing
Disaster recovery
Media streaming Web and mobile apps
Streaming webcasts
Facebook app Consumer social app
Business line of sight Mobile analytics
IT operations Digital media
Core IT and media
Ground campaign
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Amazon
Glacier
S3
Amazon
DynamoDB
Amazon
RDS
Amazon
Redshift
AWS
Direct Connect
AWS
Storage Gateway
AWS
Import/ Export
Amazon
Kinesis Amazon EMR
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Amazon EC2 Amazon EMR
Amazon
Kinesis
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Amazon
CloudFront
AWS
CloudFormation
S3
Amazon
DynamoDB
Amazon
RDS
Amazon
Redshift
Amazon EC2 Amazon EMR
AWS
Data Pipeline
The right tools. At the right scale. At the right time.
Big Data HPC
Customer Success
Story
Getting Started on
AWS
What we’ll cover today…
Take a typical big computation task…
…that an average cluster is too small
(or simply takes too long to complete)…
…optimization of algorithms can give some leverage…
…and complete the task in hand…
Applying a large cluster…
…can sometimes be overkill and too expensive
AWS instance clusters can be
balanced to the job in hand…
…nor too large…
…nor too small…
…with multiple clusters running at the same time
Why AWS for HPC?
Low cost with flexible pricing Efficient clusters
Unlimited infrastructure
Faster time to results
Concurrent Clusters on-demand
Increased collaboration
Cluster compute instances
Implement HVM process execution
Intel® Xeon® processors
10 Gigabit Ethernet –C3 has Enhanced Networking, SR-IOV
cc2.8xlarge
32 vCPUs
2.6 GHz Intel Xeon
E5-2670 Sandy Bridge
60.5 GB RAM
4 x 840 GB
Local HDD
c3.8xlarge
32 vCPUs
2.8 GHz Intel Xeon
E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB
Local SSD
AWS High Performance Computing
c3.8xlarge
32 vCPUs
2.8 GHz Intel Xeon
E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB
Local SSD
Top 500 Super Computer using Amazon EC2
64th fastest supercomputer, Nov 2013
26,496 Intel® Xeon® cores
Linpack Performance (Rmax) 484.2 TFlop/s
Theoretical (Rpeak) 593.5 Tflops/s
c3.8xlarge
32 vCPUs
2.8 GHz Intel Xeon
E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB
Local SSD
c3.8xlarge
32 vCPUs
2.8 GHz Intel Xeon
E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB
Local SSD
c3.8xlarge
32 vCPUs
2.8 GHz Intel Xeon
E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB
Local SSD
Network placement groups
Cluster instances deployed in a Placement
Group enjoy low latency, full bisection
10 Gbps bandwidth
10Gbps
AWS High Performance Computing
GPU compute instances
cg1.4xlarge
Intel® Xeon® X5570
33.5 vCPUs
22.5GB RAM
2x NVIDIA GPU
448 Cores
3GB Mem
g2.2xlarge
Intel® Xeon E5-2670
8vCPUs
15GB RAM
1x NVIDIA GPU
1536 Cores
4GB Mem
G2 instances
1 NVIDIA Kepler GK104 GPU
I/O Performance: Very High (10 Gigabit Ethernet)
CG1 instances
2 x NVIDIA Tesla “Fermi” M2050 GPUs
I/O Performance: Very High (10 Gigabit Ethernet)
AWS High Performance Computing
HPC Partners and Apps
Making Production Cloud HPC easy from 64 cores to
…
Pharma
Johnson &
Johnson
Manufacturing
HGST, a Western
Digital Company
Financial Services
Pacific Life Insurance
Genomics
Life Technologies
Research
The Aerospace
Corporation
… 156,314 cores for better solar panel materials for $33k, not $68M
Amazon EC2
16,788 Spot
Instances
Amazon S3
4TB
Processed
Spot Instances
on all 8 Regions
1.21 PetaFLOPS
Intel SandyBridge
on CC2
Big Data HPC
Customer Success
Story
Getting Started on
AWS
What we’ll cover today…
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
AWS Customer Success Story
David Hinz, Director Cloud and HPC Solutions
HGST, Inc
3/25/14
 Founded in 2003 through the combination of the hard drive
businesses of IBM, the inventor of the hard drive, and
Hitachi, Ltd (“Hitachi”)
 Acquired by Western Digital in 2012
 More than 4,200 active worldwide patents
 Headquartered in San Jose, California
 Approximately 41,000 employees worldwide
 Develops innovative, advanced hard disk drives, enterprise-class
solid state drives, external storage solutions and services
 Delivers intelligent storage devices that tightly integrate hardware
and software to maximize solution performance
6
Capacity Enterprise
Performance Enterprise
Cloud & Datacenter
Enterprise SSD
(+3 acquisitions in 2013)
7200 RPM &
CoolSpin
HDDs
Ultrastar®
Ultrastar® &
MegaScale DC™
10K & 15K
HDDs
PCIe
SAS
6
April 2013
Zero to Cloud in less than 12 Month
By 31 Dec 2013:
 Cloud eMail – Microsoft Office365
 Cloud eMail archiving/eDiscovery
 External SingleSignOn (off VPN)
 Cloud File/Collaboration – BOX
 USe– Salesforce.com
 Integrated to save files in BOX
 Cloud–High Performance Computing
(HPC) on AWS
 Cloud – Big Data Platform on AWS
 Cloud - data mart and provisioning service,
with AWS Red Shift
Evolution of Data Centers and HPC @ HGST
SJC
Servo
Team
Japan
Servo
Team
US
Head
Team
US
HAMR
Team
MN
PCB
Team
Old
Mail
System
HGST
Datacenters
On Premise
Off Premise
HPC
Clusters
An Agile Enterprise Datacenter Integrating
On-Premise and Cloud Solutions
Servo
Team in
SJC
PCB
Team in
MN
HAMR
Team in
US
Head
Team in
US
Servo
Team in
Jpn
Evaluate New Storage
Technologies and
Solutions “In House”
(HDD, SSD, etc.)HGST On Site
Business,
Production and
Enterprise
Computing
Siloes of
Clusters
On Premise
Internal
Wiki’s,
etc.
Cloud
HPC: Molecular Dynamics Simulation
• HGST uses Molecular Dynamics Simulation for
RnD of materials and lubricants needed for HDD’s
• Research to achieve higher memory densities,
faster read/write capabilities, smaller form factors
and lower power consumption
“Model Job Size”
used at HGST
Complexity
[atoms]
Number of
Time Steps Job Type “Frequency”
Small 300,000 100 200 per day, 2 days per week once or twice a month
Medium 300,000 1000 20 Medium jobs during the day, 4 days per month
Large 300,000 30000 3 large jobs per day, 6 days per month
Very Large 300,000 3000000 1 large job per month
Time
Before: Shared 512 Core
Super Computer
512 core 512core 512core
64 core
64 core
64 core
64 core
64 core
64 core
64 core
64 core
64 core
64 core
64 core
64 core
64 core
64 core
64 core
64 core
Today: AWS EC2 CC2
(Max Total 512 core)
512corewaiting
256 core 256 core
128 core 128 core
2W
waiting
waiting
All Jobs Run In Parallel on AWS  1.67x Throughput Improvement
Shape Compute To Match Work To Be Done
HPC: Micro Magnetic Simulation
• Model new technologies for future
HGST HDD products
• Finite-difference time-domain (FDTD)
numerical analysis solver
– Accurately simulation of large, complex
models of many variable parameters and
materials
– Scale across large clusters
• AWS C3 Instances provide significant
improvement for both scalability and
simulation throughput
AWS C3 Instances Provided 1.5x or Better Simulation Performance
“Cloud HPC”: What’s Next…..
Deploy Graphics User Interfaces
HPC Applications
• Pre and Post Processing in Cloud vs.
data migration back to local systems
AWS C3 Performance Validated Across
Many Applications
• Improve overall performance and reduce
monthly AWS compute bill
• “Reduce Data Search Parties”
– Stop playing “Where’s Waldo with your Data”
• “ I know I have that data….. somewhere?”
– Data Aggregation to a Common Platform with common access tools
• Improve Yields by Accessing More Data in a More Timely Manner
– By having end to end visibility to:
• Every test, every diagnostic and all info from all components of a product (internal and external)
– Speed up yield improvement ramp up on new products
– Improve steady state yield on existing products.
“Big Data” in Manufacturing
• Metrics:
– Collecting >2M manufacturing/testing binary files daily
– Collecting from ~500 tables across 6 databases  tens of millions of records daily
– Over 140 users to date in early piloting
– Over 150 attendees participated in BDP training
• Highlights: although early in the overall journey, HGST’s BDP is already demonstrating early benefits:
HGST: BDP Key Metrics and Highlights
Development Engineer: demonstrated the joining of data sets for detailed
logistics tracking—analyses that is very difficult to conduct with current
systems
Ops Engineer: a recent production issue required detailed historical data. Current systems
did not have the required retention for this data. However, the team was able to pull the data
from the BDP in minutes, as opposed to 3+ weeks to pull the data from tape archive
Development Engineer: obtained technical data from the BDP in
hours as opposed to 3+ weeks to pull from tape archive
DATA SEARCH
PARTIES
YIELD
3. Tailor Data for Consumers
• With the base Big Data platform established, the focus shifts to enabling specific business use cases.
The typical pattern involves:
HGST’s BDP Journey
1. Collect Data
Core Data
Processing
4. Develop Consumers
Derived DB
Enriched
Hive
tables
Hadoop
Analytics Libraries
Dimension
Reduction
Hive
Batch Analytics
Python
R
...
Sampling
Custom
Websites
Specific reports /
visualizations
Specific
analytics
Coredata
Hive / API
Core Data
Processing
The next phase will help to build the
specific website/reports/visualizations
that are tailored to the specific
business use case
The core effort to date has focused on building
the platform, ingesting core data sets, providing
base visualization/data mining tools, and
beginning to prepare the data for specific use
cases
2. Update Core API
Early
Successes
From Here
Commercial HPC Applications: Cloud Ready?
• HPC environments desire Cloud Computing with in-
house machines
– “Hybrid” Data Centers
• On-premise workstations + clusters
(some legacy, some new) with
burst/over-flow/connection to Cloud
– EULAs
• Should Comprehend Cloud
• Should allow License server placement
in cloud and accessible on-premise
• Make it easy to add cloud computing to
current licenses
• Consumption Based Pricing
– No consistency across vendors
– Not aligned with time based consumption pricing
“We’ve Only Just Begun….”
• Current Results in less than 12 months
• Re-aligning Business Group Leadership, Development Teams,
Research and Development Teams on New Capabilities Model
• Demands and Uses Expected To Grow And Accelerate Market
Success
73
2013 “Heavy Lifting” Provides Foundation
for 2014 Acceleration
Big Data HPC
Customer Success
Story
Getting Started on
AWS
What we’ll cover today…
Solution
Architects
Professional
Services
Premium
Support
AWS Partner
Network (APN)
AWS is here to help
AWS Architecture Diagrams
https://aws.amazon.com/architecture/
Processing large amounts of parallel data using a scalable cluster
Use commonly-available cluster
scheduling tools, such as
Grid Engine or Condor
AWS Online Software Store
http://aws.amazon.com/marketplace
Big Data Case Studies
Learn from other AWS customers
https://aws.amazon.com/solutions/case-
studies/big-data
AWS Online Software Store
https://aws.amazon.com/marketplace
AWS Marketplace
AWS Online Software Store
http://aws.amazon.com/marketplace
AWS Public Data Sets
Free access to big data sets
https://aws.amazon.com/publicdatasets
AWS in Education
https://aws.amazon.com/grants
AWS Grants Program
AWS Online Software Store
AWS Big Data Test Drives
APN Partner-provided labs
https://aws.amazon.com/testdrive/bigdata
Webinars, Bootcamps, and
Self-Paced Labs
https://aws.amazon.com/training
AWS Training & Events
https://aws.amazon.com/events
AWS Online Software Store
Big Data to AWS
Brand new course on Big Data
https://aws.amazon.com/training/course-
descriptions/bigdata/
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
https://aws.amazon.com/big-data
https://aws.amazon.com/hpc
@bensbutler (both Twitter and LinkedIn)
Thank you!

More Related Content

What's hot

Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudAmazon Web Services
 
High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101Amazon Web Services
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...Amazon Web Services
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudAccubits Technologies
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Sujee Maniyam
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAmazon Web Services
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupAndrei Savu
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEDataWorks Summit/Hadoop Summit
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit
 
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...Amazon Web Services
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataQubole
 
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...Spark Summit
 
AWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAmazon Web Services
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMCloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMRightScale
 

What's hot (20)

Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
HPC on AWS
HPC on AWSHPC on AWS
HPC on AWS
 
High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
 
EC2 Foundations - Laura Thomson
EC2 Foundations - Laura ThomsonEC2 Foundations - Laura Thomson
EC2 Foundations - Laura Thomson
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache Storm
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
 
Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
 
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
 
Mhug apache storm
Mhug apache stormMhug apache storm
Mhug apache storm
 
AWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity Couchsurfing
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMCloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
 

Viewers also liked

Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudAmazon Web Services
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Fast & Furious: building HPC solutions in a nutshell
Fast & Furious: building HPC solutions in a nutshellFast & Furious: building HPC solutions in a nutshell
Fast & Furious: building HPC solutions in a nutshellVictor Haydin
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
Big Data y el sector salud
Big Data y el sector saludBig Data y el sector salud
Big Data y el sector saludBEEVA_es
 
HPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and WorkflowsHPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and Workflowsinside-BigData.com
 
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksApache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksHortonworks
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Súbete a la nube con Go To Cloud
Súbete a la nube con Go To CloudSúbete a la nube con Go To Cloud
Súbete a la nube con Go To CloudNombre Apellidos
 
La empresa en Internet y redes sociales: cómo estar presente sin morir en el ...
La empresa en Internet y redes sociales: cómo estar presente sin morir en el ...La empresa en Internet y redes sociales: cómo estar presente sin morir en el ...
La empresa en Internet y redes sociales: cómo estar presente sin morir en el ...Nombre Apellidos
 

Viewers also liked (20)

Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
High–Performance Computing
High–Performance ComputingHigh–Performance Computing
High–Performance Computing
 
HPC Market Update from IDC
HPC Market Update from IDCHPC Market Update from IDC
HPC Market Update from IDC
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
Enterprise architecture for big data projects
Enterprise architecture for big data projectsEnterprise architecture for big data projects
Enterprise architecture for big data projects
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Fast & Furious: building HPC solutions in a nutshell
Fast & Furious: building HPC solutions in a nutshellFast & Furious: building HPC solutions in a nutshell
Fast & Furious: building HPC solutions in a nutshell
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Big Data y el sector salud
Big Data y el sector saludBig Data y el sector salud
Big Data y el sector salud
 
HPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and WorkflowsHPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and Workflows
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksApache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance Benchmarks
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Súbete a la nube con Go To Cloud
Súbete a la nube con Go To CloudSúbete a la nube con Go To Cloud
Súbete a la nube con Go To Cloud
 
Big data
Big dataBig data
Big data
 
Beamer k 18
Beamer k 18Beamer k 18
Beamer k 18
 
La empresa en Internet y redes sociales: cómo estar presente sin morir en el ...
La empresa en Internet y redes sociales: cómo estar presente sin morir en el ...La empresa en Internet y redes sociales: cómo estar presente sin morir en el ...
La empresa en Internet y redes sociales: cómo estar presente sin morir en el ...
 

Similar to Big Data and High Performance Computing Solutions in the AWS Cloud

Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
The Future of Digital Advertising with Cloud Computing - co-presented with Ad...
The Future of Digital Advertising with Cloud Computing - co-presented with Ad...The Future of Digital Advertising with Cloud Computing - co-presented with Ad...
The Future of Digital Advertising with Cloud Computing - co-presented with Ad...Amazon Web Services
 
AWS Summit Nordics - Opening Keynote
AWS Summit Nordics - Opening KeynoteAWS Summit Nordics - Opening Keynote
AWS Summit Nordics - Opening KeynoteAmazon Web Services
 
AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11Amazon Web Services
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAmazon Web Services Korea
 
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016Amazon Web Services Korea
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analyticsAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Amazon Web Services
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스Amazon Web Services Korea
 

Similar to Big Data and High Performance Computing Solutions in the AWS Cloud (20)

Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
AWS RoadShow 2013 Curitiba
AWS RoadShow 2013 CuritibaAWS RoadShow 2013 Curitiba
AWS RoadShow 2013 Curitiba
 
Big dataandhp cforawsbrasilsummit
Big dataandhp cforawsbrasilsummitBig dataandhp cforawsbrasilsummit
Big dataandhp cforawsbrasilsummit
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
The Future of Digital Advertising with Cloud Computing - co-presented with Ad...
The Future of Digital Advertising with Cloud Computing - co-presented with Ad...The Future of Digital Advertising with Cloud Computing - co-presented with Ad...
The Future of Digital Advertising with Cloud Computing - co-presented with Ad...
 
AWS Summit Nordics - Opening Keynote
AWS Summit Nordics - Opening KeynoteAWS Summit Nordics - Opening Keynote
AWS Summit Nordics - Opening Keynote
 
AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
 
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 

Recently uploaded (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 

Big Data and High Performance Computing Solutions in the AWS Cloud

  • 1. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Big Data and High Performance Computing Solutions in the AWS Cloud Ben Butler, Sr. Mgr. Big Data & HPC Marketing @bensbutler March 26, 2014
  • 2. Tell us: What’s good, what’s not What you want to see at these events What you want AWS to deliver for you Your feedback is very important to us
  • 3. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  • 4. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  • 5. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 6. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 7. GB TB PB 95% of the 1.2 zettabytes of data in the digital universe is unstructured 70% of of this is user- generated content Unstructured data growth explosive, with estimates of compound annual growth (CAGR) at 62% from 2008 – 2012. Source: IDC ZB EB Big Data: Unconstrained data growth
  • 8. Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  • 9. Customer segmentation Marketing spend optimization Financial modeling & forecasting Ad targeting & real time bidding Clickstream analysis Fraud detection Use Cases
  • 10. Visits, views, clicks, purchases Source, device, location, time Latency, throughput, uptime Likes, shares, friends, follows Price, frequency Metrics
  • 14. Lower cost, higher throughput Highly constrained Generation Collection & storage Analytics & computation Collaboration & sharing
  • 15. Generated data Available for analysis Data volume Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  • 16. Elastic and highly scalable No upfront capital expense Only pay for what you use + + Available on-demand + = Remove constraints
  • 17. Accelerated Generation Collection & storage Analytics & computation Collaboration & sharing
  • 18. Technologies and techniques for working productively with data, at any scale. Big Data
  • 19. Big data and AWS cloud computing Big data Cloud computing Variety, volume, and velocity requiring new tools Variety of compute, storage, and networking options
  • 20. Big data and AWS cloud computing Big data Cloud computing Potentially massive datasets Massive, virtually unlimited capacity
  • 21. Big data and AWS cloud computing Big data Cloud computing Iterative, experimental style of data manipulation and analysis Iterative, experimental style of infrastructure deployment/usage
  • 22. Big data and AWS cloud computing Big data Cloud computing Frequently not a steady-state workload; peaks and valleys At its most efficient with highly variable workloads
  • 23. Big data and AWS cloud computing Big data Cloud computing Absolute performance not as critical as “time to results”; shared resources are a bottleneck Parallel compute projects allow each workgroup to have more autonomy, get faster results
  • 25. no capital investment pay as you go no subscriptions only pay for what you use Ease of useLower costs
  • 26. programmable zero admin easy to configure integrate with existing tools Ease of useLower costs
  • 27. One tool to rule them all
  • 28. Use the right tools Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon Redshift Amazon Elastic MapReduce
  • 30. Real-time processing High throughput; elastic Easy to use EMR, S3, Redshift, DynamoDB Integrations Amazon Kinesis
  • 31. NoSQL Database Seamless scalability Zero admin Single digit millisecond latency Amazon DynamoDB
  • 32. Relational data warehouse Massively parallel Petabyte scale Fully managed $1,000/TB/Year Amazon Redshift
  • 33. Hadoop/HDFS clusters Hive, Pig, Impala, Hbase Easy to use; fully managed On-demand and spot pricing Tight integration with S3, DynamoDB, and Kinesis Amazon Elastic MapReduce
  • 34. HDFS Analytics languages Data management Amazon RedShift Amazon EMR Amazon RDS Amazon S3 Amazon DynamoDB Amazon Kinesis Sources SourcesData Sources AWS Data Pipeline
  • 35. Bizo: Digital Ad. Tech Metering with Amazon Kinesis Continuous Ad Metrics Extraction Incremental Ad Statistics Computation Metering Record Archive Ad Analytics Dashboard
  • 36. Free steak campaign Facebook page Mars exploration ops Consumer social app Ticket pricing optimizationSAP & SharePoint Securities Trading Data Archiving Marketing web site Interactive TV apps Financial markets analytics Consumer social app Big data analytics Web site & media sharing Disaster recovery Media streaming Web and mobile apps Streaming webcasts Facebook app Consumer social app Business line of sight Mobile analytics IT operations Digital media Core IT and media Ground campaign
  • 37. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 38. Generation Collection & storage Analytics & computation Collaboration & sharing Amazon Glacier S3 Amazon DynamoDB Amazon RDS Amazon Redshift AWS Direct Connect AWS Storage Gateway AWS Import/ Export Amazon Kinesis Amazon EMR
  • 39. Generation Collection & storage Analytics & computation Collaboration & sharing Amazon EC2 Amazon EMR Amazon Kinesis
  • 40. Generation Collection & storage Analytics & computation Collaboration & sharing Amazon CloudFront AWS CloudFormation S3 Amazon DynamoDB Amazon RDS Amazon Redshift Amazon EC2 Amazon EMR AWS Data Pipeline
  • 41. The right tools. At the right scale. At the right time.
  • 42. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  • 43. Take a typical big computation task…
  • 44. …that an average cluster is too small (or simply takes too long to complete)…
  • 45. …optimization of algorithms can give some leverage…
  • 46. …and complete the task in hand…
  • 47. Applying a large cluster…
  • 48. …can sometimes be overkill and too expensive
  • 49. AWS instance clusters can be balanced to the job in hand…
  • 52. …with multiple clusters running at the same time
  • 53. Why AWS for HPC? Low cost with flexible pricing Efficient clusters Unlimited infrastructure Faster time to results Concurrent Clusters on-demand Increased collaboration
  • 54. Cluster compute instances Implement HVM process execution Intel® Xeon® processors 10 Gigabit Ethernet –C3 has Enhanced Networking, SR-IOV cc2.8xlarge 32 vCPUs 2.6 GHz Intel Xeon E5-2670 Sandy Bridge 60.5 GB RAM 4 x 840 GB Local HDD c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD AWS High Performance Computing
  • 55. c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD Top 500 Super Computer using Amazon EC2 64th fastest supercomputer, Nov 2013 26,496 Intel® Xeon® cores Linpack Performance (Rmax) 484.2 TFlop/s Theoretical (Rpeak) 593.5 Tflops/s c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD
  • 56. Network placement groups Cluster instances deployed in a Placement Group enjoy low latency, full bisection 10 Gbps bandwidth 10Gbps AWS High Performance Computing
  • 57. GPU compute instances cg1.4xlarge Intel® Xeon® X5570 33.5 vCPUs 22.5GB RAM 2x NVIDIA GPU 448 Cores 3GB Mem g2.2xlarge Intel® Xeon E5-2670 8vCPUs 15GB RAM 1x NVIDIA GPU 1536 Cores 4GB Mem G2 instances 1 NVIDIA Kepler GK104 GPU I/O Performance: Very High (10 Gigabit Ethernet) CG1 instances 2 x NVIDIA Tesla “Fermi” M2050 GPUs I/O Performance: Very High (10 Gigabit Ethernet) AWS High Performance Computing
  • 59. Making Production Cloud HPC easy from 64 cores to … Pharma Johnson & Johnson Manufacturing HGST, a Western Digital Company Financial Services Pacific Life Insurance Genomics Life Technologies Research The Aerospace Corporation … 156,314 cores for better solar panel materials for $33k, not $68M Amazon EC2 16,788 Spot Instances Amazon S3 4TB Processed Spot Instances on all 8 Regions 1.21 PetaFLOPS Intel SandyBridge on CC2
  • 60. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  • 61. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS Customer Success Story David Hinz, Director Cloud and HPC Solutions HGST, Inc 3/25/14
  • 62.  Founded in 2003 through the combination of the hard drive businesses of IBM, the inventor of the hard drive, and Hitachi, Ltd (“Hitachi”)  Acquired by Western Digital in 2012  More than 4,200 active worldwide patents  Headquartered in San Jose, California  Approximately 41,000 employees worldwide  Develops innovative, advanced hard disk drives, enterprise-class solid state drives, external storage solutions and services  Delivers intelligent storage devices that tightly integrate hardware and software to maximize solution performance 6 Capacity Enterprise Performance Enterprise Cloud & Datacenter Enterprise SSD (+3 acquisitions in 2013) 7200 RPM & CoolSpin HDDs Ultrastar® Ultrastar® & MegaScale DC™ 10K & 15K HDDs PCIe SAS
  • 63. 6 April 2013 Zero to Cloud in less than 12 Month By 31 Dec 2013:  Cloud eMail – Microsoft Office365  Cloud eMail archiving/eDiscovery  External SingleSignOn (off VPN)  Cloud File/Collaboration – BOX  USe– Salesforce.com  Integrated to save files in BOX  Cloud–High Performance Computing (HPC) on AWS  Cloud – Big Data Platform on AWS  Cloud - data mart and provisioning service, with AWS Red Shift
  • 64. Evolution of Data Centers and HPC @ HGST SJC Servo Team Japan Servo Team US Head Team US HAMR Team MN PCB Team Old Mail System HGST Datacenters On Premise Off Premise HPC Clusters An Agile Enterprise Datacenter Integrating On-Premise and Cloud Solutions Servo Team in SJC PCB Team in MN HAMR Team in US Head Team in US Servo Team in Jpn Evaluate New Storage Technologies and Solutions “In House” (HDD, SSD, etc.)HGST On Site Business, Production and Enterprise Computing Siloes of Clusters On Premise Internal Wiki’s, etc. Cloud
  • 65. HPC: Molecular Dynamics Simulation • HGST uses Molecular Dynamics Simulation for RnD of materials and lubricants needed for HDD’s • Research to achieve higher memory densities, faster read/write capabilities, smaller form factors and lower power consumption “Model Job Size” used at HGST Complexity [atoms] Number of Time Steps Job Type “Frequency” Small 300,000 100 200 per day, 2 days per week once or twice a month Medium 300,000 1000 20 Medium jobs during the day, 4 days per month Large 300,000 30000 3 large jobs per day, 6 days per month Very Large 300,000 3000000 1 large job per month
  • 66. Time Before: Shared 512 Core Super Computer 512 core 512core 512core 64 core 64 core 64 core 64 core 64 core 64 core 64 core 64 core 64 core 64 core 64 core 64 core 64 core 64 core 64 core 64 core Today: AWS EC2 CC2 (Max Total 512 core) 512corewaiting 256 core 256 core 128 core 128 core 2W waiting waiting All Jobs Run In Parallel on AWS  1.67x Throughput Improvement Shape Compute To Match Work To Be Done
  • 67. HPC: Micro Magnetic Simulation • Model new technologies for future HGST HDD products • Finite-difference time-domain (FDTD) numerical analysis solver – Accurately simulation of large, complex models of many variable parameters and materials – Scale across large clusters • AWS C3 Instances provide significant improvement for both scalability and simulation throughput AWS C3 Instances Provided 1.5x or Better Simulation Performance
  • 68. “Cloud HPC”: What’s Next….. Deploy Graphics User Interfaces HPC Applications • Pre and Post Processing in Cloud vs. data migration back to local systems AWS C3 Performance Validated Across Many Applications • Improve overall performance and reduce monthly AWS compute bill
  • 69. • “Reduce Data Search Parties” – Stop playing “Where’s Waldo with your Data” • “ I know I have that data….. somewhere?” – Data Aggregation to a Common Platform with common access tools • Improve Yields by Accessing More Data in a More Timely Manner – By having end to end visibility to: • Every test, every diagnostic and all info from all components of a product (internal and external) – Speed up yield improvement ramp up on new products – Improve steady state yield on existing products. “Big Data” in Manufacturing
  • 70. • Metrics: – Collecting >2M manufacturing/testing binary files daily – Collecting from ~500 tables across 6 databases  tens of millions of records daily – Over 140 users to date in early piloting – Over 150 attendees participated in BDP training • Highlights: although early in the overall journey, HGST’s BDP is already demonstrating early benefits: HGST: BDP Key Metrics and Highlights Development Engineer: demonstrated the joining of data sets for detailed logistics tracking—analyses that is very difficult to conduct with current systems Ops Engineer: a recent production issue required detailed historical data. Current systems did not have the required retention for this data. However, the team was able to pull the data from the BDP in minutes, as opposed to 3+ weeks to pull the data from tape archive Development Engineer: obtained technical data from the BDP in hours as opposed to 3+ weeks to pull from tape archive DATA SEARCH PARTIES YIELD
  • 71. 3. Tailor Data for Consumers • With the base Big Data platform established, the focus shifts to enabling specific business use cases. The typical pattern involves: HGST’s BDP Journey 1. Collect Data Core Data Processing 4. Develop Consumers Derived DB Enriched Hive tables Hadoop Analytics Libraries Dimension Reduction Hive Batch Analytics Python R ... Sampling Custom Websites Specific reports / visualizations Specific analytics Coredata Hive / API Core Data Processing The next phase will help to build the specific website/reports/visualizations that are tailored to the specific business use case The core effort to date has focused on building the platform, ingesting core data sets, providing base visualization/data mining tools, and beginning to prepare the data for specific use cases 2. Update Core API Early Successes From Here
  • 72. Commercial HPC Applications: Cloud Ready? • HPC environments desire Cloud Computing with in- house machines – “Hybrid” Data Centers • On-premise workstations + clusters (some legacy, some new) with burst/over-flow/connection to Cloud – EULAs • Should Comprehend Cloud • Should allow License server placement in cloud and accessible on-premise • Make it easy to add cloud computing to current licenses • Consumption Based Pricing – No consistency across vendors – Not aligned with time based consumption pricing
  • 73. “We’ve Only Just Begun….” • Current Results in less than 12 months • Re-aligning Business Group Leadership, Development Teams, Research and Development Teams on New Capabilities Model • Demands and Uses Expected To Grow And Accelerate Market Success 73 2013 “Heavy Lifting” Provides Foundation for 2014 Acceleration
  • 74. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  • 76. AWS Architecture Diagrams https://aws.amazon.com/architecture/ Processing large amounts of parallel data using a scalable cluster Use commonly-available cluster scheduling tools, such as Grid Engine or Condor
  • 77. AWS Online Software Store http://aws.amazon.com/marketplace Big Data Case Studies Learn from other AWS customers https://aws.amazon.com/solutions/case- studies/big-data
  • 78. AWS Online Software Store https://aws.amazon.com/marketplace AWS Marketplace
  • 79. AWS Online Software Store http://aws.amazon.com/marketplace AWS Public Data Sets Free access to big data sets https://aws.amazon.com/publicdatasets
  • 81. AWS Online Software Store AWS Big Data Test Drives APN Partner-provided labs https://aws.amazon.com/testdrive/bigdata
  • 82. Webinars, Bootcamps, and Self-Paced Labs https://aws.amazon.com/training AWS Training & Events https://aws.amazon.com/events
  • 83. AWS Online Software Store Big Data to AWS Brand new course on Big Data https://aws.amazon.com/training/course- descriptions/bigdata/
  • 84. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. https://aws.amazon.com/big-data https://aws.amazon.com/hpc @bensbutler (both Twitter and LinkedIn) Thank you!