SlideShare a Scribd company logo
1 of 76
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pierre-Yves Aquilanti, Ph.D. – Senior HPC Specialized Solution Architect
Anh Tran – Senior HPC Specialized Solution Architect
Tuesday, August 7, 2018
High Performance Computing
on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
§ Overview of AWS Infrastructure
§ Why HPC on AWS
§ HPC Solution Components
§ Use Cases and Customer Stories
§ Security
§ Cost Optimization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Global Infrastructure
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Over 100 Global CloudFront PoPs
AWS Global Infrastructure
Regions
Amazon Global
Network
• Redundant 100GbE network
• Redundant private capacity
between all Regions except China
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Global Infrastructure
18 Regions – 54 Availability Zones – *114 Points of Presence
Region & Number of Availability Zones
US West EU
Oregon (3) Ireland (3)
Northern California (3) Frankfurt (3)
London (3)
US East Paris (3)
N. Virginia (6), Ohio (3)
Asia Pacific
Canada Singapore (3)
Central (2) Sydney (3), Tokyo (4),
Seoul (2), Mumbai (2)
GovCloud US-West (3)
China
South America Beijing (2)
São Paulo (3) Ningxia (2)
Announced Regions
Bahrain, Hong Kong, SAR(China), GovCloud
(US-East)*103 Edge Locations and 11 Regional Edge Caches
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why HPC on AWS?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Running HPC Workloads Everyday
§ Logistics
§ Machine learning
§ Data Center, network, and
server design
§ Consumer product design
§ Robotics
§ Semiconductor design
§ Retail and financial analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tightly Coupled
Parallel
Computing
Loosely Coupled
Parallel
Computing
Accelerated
Computing
Visualization and
Interpretation
High Performance
Data Storage and
Analytics
Scale
EC2 Spot
Pricing
Early Access to
Technology
Choice Performance
Derive unique
insights with AI/ML
Skip the Queue View results
instantly
AWS Advantages for HPC Workload Types
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why HPC on AWS - Multiple Clusters
$ qsub –q monolith iwait.sh
$ qsub dev.sh
$ qsub prod.sh
$ qsub critical.sh
$ qsub bigrun.sh
On-Prem
Launch clusters by group, user,
application – no more waiting!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
M5
General
purpose
Compute
optimized
Storage and IO
optimized
GPU and FPGA
accelerated
Memory
optimized
X1 F1
P3
T2
I3 D2
R4
C5
C4
Shape the compute to match the work to be done
P2
M4
Optimize application-specific infrastructure
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cost Advantages
On Premises
Capital Expense Model
Amazon Web Services
Pay As You Go Model
§ Use only what you need
§ Multiple pricing models
§ High upfront capital cost
§ High cost of ongoing support
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EC2 Purchasing Options
On-Demand
Pay for compute capacity by
the second with no long-
term commitments
Spiky workloads, to define
needs
Reserved
Make a 1 or 3 Year commitment
and receive a significant discount
off On-Demand prices
Committed, steady-state usage
Spot
Spare EC2 capacity at savings of
up to 90% off On-Demand prices
Fault-tolerant, dev/test, time-
flexible, stateless workloads
Per Second Billing for EC2 Linux instances & EBS volumes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building an HPC Infrastructure in AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Understanding the Drivers
What are the motivations to use Cloud computing?
How running on AWS would be different from on-premises?
What would you need to launch a PoC on AWS today?
What are the requirements for your application?
Do you need to visualize your data?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC Solutions
Storage
EBS EFS
S3
Networking
Enhanced
Networking
Placement
Groups
Automation &
Orchestration
AWS Batch
CfnCluster
NICE EnginFrame
Visualization
NICE DCV
Appstream 2.0
Compute
EC2 Instance
EC2 Spot
Auto Scaling
Accelerated
Compute
FPGA
GPU
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EC2 Instances
General
purpose
Dense
storage
Compute
optimized
FPGA
GPU
Compute
Storage
optimized
Graphics
intensive
Memory
optimized
High
I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4
High
I/O
General
purpose
burstable Direct access to
physical server
resources
Optimize the price/performance of your HPC Workloads with the
widest range of compute instances
C5DM5D R5 R5D
N E W !N E W ! N E W ! N E W !
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EFS Amazon EBS
Amazon EC2
Instance Store
Amazon
S3 / S3-IA
Amazon Glacier
Object
Data Transfer
AWS Direct
Connect
ISV
Connectors
Amazon
Kinesis
Firehose
Storage
Gateway
S3 Transfer
Acceleration
AWS Storage is a Platform
AWS
Snowball
Amazon
CloudFront
Internet/
VPN
BlockFile
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Orchestration
AWS Batch
Managed
AWS Lambda CfnCluster
Un-Managed
Traditional
Scheduler
AWS Step
Functions
Application Services
Amazon SWF
§Fully-managed services
§Run large-scale compute
workloads or simple
functions
§Focus on your jobs and
their resources instead of
the infrastructure
§Quickly deploy a cluster
using third-party schedulers
§Bring your own scheduler
or use AWS Marketplace
solutions
§Design and orchestrate
workflows, with support for
branching and callouts to
other AWS services.
§Easily integrated with
AWS Batch, AWS
Lambda…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Network Performance
AWS Proprietary Network, 10Gbps & 25Gbps
§ Highest performance in largest EC2 instance sizes
§ Full bi-section bandwidth in Placement Groups, with no network
oversubscription
Enhanced Networking
§ Over 1M PPS performance, reduced instance-to-instance
latencies, more consistent network performance
EC2 to S3
§ Traffic to and from S3 can now take advantage of up to 25 Gbps
of bandwidth
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Graphics and Collaboration with DCV and AppStream
Pre-and post processing as well as HPC
§Use GPUs in the cloud for remote
rendering and remote desktops
Collaborating Securily
§Encrypt the data in flight and at rest
§Manage your own keys and credentials
§Deliver pixels to your collaborators, not the
actual data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deploying HPC on AWS
3D GRAPHICS VIRTUAL WORKSTATION
LICENSE MANAGERS AND CLUSTER
HEAD NODES WITH JOB SCHEDULERS
CLOUD-BASED, AUTO-SCALING HPC CLUSTERS
SHARED FILE
STORAGE
STORAGE CACHE
Amazon S3
and Amazon Glacier
ON-PREMISES
HPC RESOURCES
Corporate Datacenter
AWS SNOWBALL
AWS DIRECT
CONNECT
THIN - NO LOCAL DATA
-
OR ZERO CLIENT
APPSTREAM 2.0
AWS BATCH
On AWS, secure and
well-optimized HPC
clusters can be
automatically created,
operated, and torn down
in just minutes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Launching new instances and running tests in
parallel is easy…[when choosing an instance]
there is no substitute for measuring the
performance of your full application.”
—EC2 documentation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Use Cases
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Several Kinds of HPC Workloads
Data Light
Minimal
requirements
for high
performance
storage
Data Heavy
Benefits from
access to high
performance
storage
Clustered (Tightly coupled)
Distributed / Grid (Loosely coupled)
• Fluid dynamics
• Weather forecasting
• Materials simulations
• Crash simulations
• Risk simulations
• Molecular modeling
• Contextual search
• Logistics simulations
• Animation and VFX
• Semiconductor verification
• Image processing/GIS
• Genomics
• Seismic processing
• Metagenomics
• Astrophysics
• Deep learning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC Grids in Financial Services
“Using AWS helps us
reduce a 10-day
process to 10 minutes.
That’s transformative: it
broadens our ability to
discover.”
Peter Phillips
Managing Director
Aon Benfield Securities
Using GPU acceleration
The Challenge
§ Spinning up up large numbers of GPUs quickly and inexpensively to
meet ABSI’s customers financial modeling & reporting needs
§ ABSI uses proprietary algorithms (Monte Carlo simulations) running
millions of times
The Solution
§ ABSI moved its infrastructure to AWS and deprecated its co-located data
center
§ ABSI built a front-end on AWS for its processing solution, automatically
running GPU instances on Amazon EC2 using EBS in an Amazon VPC for
security
The Result
§ Can be as much as 500 times more efficient in terms of performance per
dollar for some clients
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC clusters in Healthcare & LifeSciences
“By spinning up a few hundred
nodes on AWS and getting results in
less than a day, our scientific
researchers have a lot more
freedom to ask questions that
weren’t even possible before. The
speed is important, but equally
important is the additional
intellectual curiosity this enables for
researchers”
Lance Smith
Associate Director of IT, Celgene
HPC on AWS for Cancer Drug Research
The Challenge
§ Slower time to results due to wait times and longer times to run jobs
on fixed configurations available
§ Hard to collaborate with external entities due to security and
compliance issues
§ Inability to scale beyond the fixed number of cores that were
available on premises
The Solution
§ The company runs many HPC workloads on hundreds of Amazon
EC2 instances and uses Amazon S3 and Amazon Glacier to store
hundreds of terabytes of genomic data
§ Using Amazon VPC, AWS Access and Identity Management, AWS
Direct Connect to collaborate securely
The Result
§ HPC job time reduced to hours instead of weeks
§ More parallel work being achieved leading to increased productivity
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC in Design & Engineering
Boom leverages Rescale and AWS to enable supersonic travel
“Rescale’s ScaleX cloud
platform is a game-changer
for engineering. It gives
Boom computing resources
comparable to building a
large on-premise HPC
center. Rescale lets us
move fast with minimal
capital spending and
resources overhead.”
Josh Krall
CTO & Co-Founder
§ Simulated vortex lift with 200M cell models on 512+ cores
§ Increased simulation throughput: 100 jobs in parallel with 6x
speedup per job → 600x speedup
§ Eliminated IT overhead, including server capital costs & in-house IT
and software costs
§ Elastic HPC capacity and pay-as-you-go AWS clusters allow business
agility & ability to scale
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1.1M vCPUs for machine learning
A group of researchers
from Clemson University
achieved a remarkable
milestone while studying
topic modeling, an
important component of
machine learning
associated with natural
language processing,
breaking the record for
creating the largest high-
performance cluster in
the cloud by using more
than 1,100,000 vCPUs on
Amazon EC2 Spot
Instances running in a
single AWS region.
The graph highlights
the elastic, automatic
expansion of
resources.
Clemson took
advantage of the new
per-second billing for
EC2 instances.
The vCPU count
usage is comparable
to the core count on
the largest
supercomputers in
the world.
S3
Provisionin
g and
workflow
automation
software
S3
JOB
SCRIPT
CLOUDY
CLUSTER
APIs
LOGIN SCHEDULER
SLURM
AUTO
SCALING
SPOT FLEET
CCQ
S3
DDB VPC
https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million-vcpus-ec2-spot-instances/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you, and how can I help you run
HPC workloads on AWS?
aws.amazon.com/hpc
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pierre-Yves Aquilanti, Ph.D. – Senior HPC Specialized Solution Architect
Anh Tran – Senior HPC Specialized Solution Architect
Tuesday, August 7, 2018
HPC on AWS Deep Dive
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
§ First launched in August 2006
§ M1 instance
§ “One size fits all”
M1
In the past
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EC2 Instances
General
purpose
Dense
storage
Compute
optimized
FPGA
GPU
Compute
Storage
optimized
Graphics
intensive
Memory
optimized
High
I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4
High
I/O
General
purpose
burstable Direct access to
physical server
resources
Optimize the price/performance of your HPC Workloads with the
widest range of compute instances
C5DM5D R5 R5D
N E W !N E W ! N E W ! N E W !
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance
generation
c5.9large
Instance family Instance size
Elastic Compute Cloud (EC2) Instance Naming
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance sizing
c5.18xlarge 2 x c5.9xlarge
≈
4 x c5.4xlarge
≈
8 x c5.2xlarge
≈
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hypervisor
Management,
Security, and
Monitoring
Storage
Customer
Instances
Network
Original EC2 Host Architecture
§ All resources were
on the server
§ Instance Goals:
• Security
• Performance
• Familiarity
SERVER
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hypervisor
Management,
Security, and
Monitoring
Storage
Customer
Instances
Network
EC2 C5 Instance
§ Nearly 100% of
available compute
resources available
to customers’
workload
§ Improved security
SERVER
NITRO
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
C5 Instances - Intel® XEON® Scalable Processor
§ Intel Skylake @ 3.0 GHz
(turbo to 3.5GHz)
§ Supports AVX512
§ C-state controls
§ Nitro System, a
combination of
dedicated hardware
and lightweight
hypervisor
§ Up to 25 Gbps network
AVX 512
72 vCPUs
“Skylake”
144 GiB memory
C5
12 Gbps to EBS
2X vCPUs
3X throughput
2.4X memory
C4
36 vCPUs
“Haswell”
4 Gbps to EBS
60 GiB memory
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance Considerations
Test using real-world examples
§ Use large cases for testing: do
not benchmark scalability
using only small examples
MPI libraries
§ Test with Intel MPI and
OpenMPI 3.0, and make use of
available tunings
Domain decomposition
§ Choose number of cells per
core for either per-core
efficiency or for faster results
Network
§ Use a placement group
§ Enable enhanced networking
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Application Compilation and Run
§ Can your application use “hybrid mode”?
§ Mix of MPI and OpenMP
§ WRF is a great example
§ Are you using the Intel compiler (AWS has Intel procs)?
§ Compile
§ Runtime
Consider installing the Intel compiler on AWS:
https://software.intel.com/en-us/articles/install-intel-parallel-studio-xe-on-
amazon-web-services-aws
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s a Virtual CPU? (vCPU)
§ A vCPU is typically an Intel hyper-threaded physical core*
§ On Linux, “A” threads enumerated before “B” threads
§ On Windows, threads are interleaved
§ Divide vCPU count by 2 to get core count
§ Cores by EC2 & RDS DB Instance type:
https://aws.amazon.com/ec2/virtualcores/
* The “T” family is special
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Disable Hyper-Threading (on the OS)
§ Useful for CPU heavy applications
§ Use ‘lscpu’ to validate layout
§ Disable Hyper-Threading without reboot
§ Set grub to only initialize the first half of all threads
for cpunum in $(cat 
/sys/devices/system/cpu/cpu*/topology/thread_siblings_list | 
cut -s -d, -f2- | tr ',' 'n' | sort -un); do
echo 0 | sudo tee /sys/devices/system/cpu/cpu${cpunum}/online
done
maxcpus=64
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Timekeeping Explained
§ Timekeeping in an instance is deceptively hard
§ gettimeofday(), clock_gettime(), QueryPerformanceCounter()
§ The TSC
§ CPU counter, accessible from userspace
§ Requires calibration, vDSO
§ Invariant on Sandy Bridge+ processors
§ Xen pvclock; does not support vDSO
§ On current generation instances, use TSC as clocksource
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Change with:
Use TSC as clocksource
Check with:
Change at boot time with (e.g. /etc/default/grub):
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Transfer
HPC Data Flow on AWS Storage
corporate data center
Amazon
Glacier
Amazon S3
AWS Direct
Connect
ISV
Connectors
Storage
Gateway
AWS
Snowball
Internet/VPN
Ingress
Egress
Lifecycle
EC2 Instance
EBS
Instance
Store
Object, Block, File Storage
Amazon
Kinesis
Firehose
S3 Transfer
Acceleration
Amazon
CloudFront
Other Shared File
System
EFS
25 Gbps to S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Instance Store
Temporary block-level storage
Physically attached to host computer
Lifetime
• Data lost when:
• drive failure
• instance stops
• instance terminates
• Data persists on reboot
Instance store data loss
prevention:
• Create RAID 1/5/6
• Move data to S3 or EBS
• Create a fault tolerant FS
XX
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EBS Volume Types
General Purpose
SSD
balance price and
performance for a wide
variety of transactional
data
gp2
Provisioned IOPS
SSD
latency-sensitive
transactional workloads
io1
Throughput Optimized
HDD
frequently accessed, throughput
intensive workloads
st1
Cold
HDD
less frequently
accessed data
sc1
SSD HDD
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EBS – Elastic Block Storage
Two Block Storage options for EC2 Instances: EBS and Instance Store
EC2 Instance
/dev/xvda
/dev/xvdb
/dev/xvdc
Block Device Mapping Instance Store
ephemeral0
ephemeral1
vol-xxxxxxxx
vol-xxxxxxxx
/dev/xvdd
EBS Volumes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EBS Performance
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html - As of July 25, 2017
Instance
type
EBS-
optimized
by default
Max EBS
bandwidth
(Mbps)*
Expected
throughput
(MB/s)**
Max. IOPS
(16 KB I/O
size)**
Max Network
bandwidth
3 Year Reserved
$/Hour
(N. Virginia)
r4.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $1.600
m4.16xlarge Yes 10,000 1,250 65,000 25 Gb/s $1.203
c5.19xlarge Yes 9,000 1,125 64,000 25Gb/s $1.928
g3.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $2.023
i3.16xlarge Yes 14,000 1,750 65,000 25 Gb/s $2.112
f1.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $5.734
p2.16xlarge Yes 10,000 1,250 65,000 25 Gb/s $6.392
x1.32xlarge Yes 10,000 1,250 65,000 10 Gb/s $3.732
Choose the right instance:
RAID multiple EBS volumes together to achieve max performance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Classes and Tiering on Amazon S3
Standard
• Primary data
• Big Data Analytics
• Small objects
• Temporary scratch space
• Archive data
• Deep/offline archives
• Tape vaulting replacement
• WORM-compliant data
• File sync and share
• Active Archive
• Enterprise backup
• Media transcoding
• Geo-redundancy/DR
Standard - Infrequent Access Amazon Glacier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
File Systems on AWS
§ EFS – Elastic File System
§ Distributed across multiple AZs
§ Petabyte-scale
§ Easy to bring up, no management
§ Build your own NFS
§ Use for a POC
§ Ephemeral data (i3.*)
§ Parallel file systems
§ Build your own or use APN solutions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Automation and Batch Processing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Traditional Job Schedulers Integrate Easily
Bring your scheduler to AWS, or build your own
§ IBM Platform LSF
§ Univa Grid Engine
§ Altair PBS Pro
§ SLURM
§ Design your own using AWS services
§ Do you actually need a scheduler?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Batch
§ AWS Batch dynamically provisions resources
§ Plans, schedules, and executes
§ No batch software to install
Focus on your applications and results!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC Automation and Orchestration
Choose from several options to adapt your workloads
§ CfnCluster
§ AWS Batch
§ AWS-NICE DCV and EnginFrame
§ Build your own CloudFormation templates
§ ISV offerings on Marketplace or use an SI
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Launch a Cluster in minutes
§ Cluster creation usually
takes ~15 minutes
§ Completely managed by
CloudFormation
$ cfncluster create mycluster
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CfnCluster Configuration Options
§ Operating System
§ Amazon Linux
§ Centos 6
§ Centos 7
§ Ubuntu 14.04
§ Scheduler
§ Sun Grid Engine (SGE)
§ PBS/Torque
§ SLURM
§ Storage Size & IOPS
§ EBS & Instance Store
Encryption
§ Scaling Speed & Limits
§ Provisioning Scripts
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
License Server Management
§ FLEXlm works natively on
AWS
§ Each EC2 instance has a
unique hostname & hardware
address that can’t be spoofed
§ Set the ENI (Network Interface)
for your license server not to
“Delete on termination”
§ Allows for simple license
failover and migration
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC Architecture on AWS
corporate data center
availability zone
autoscaling group
parallel
FS
local
NFS
s3
data
ingress/egress
EFS
§ Three file systems: EFS,
Local NFS, and Parallel FS
§ Snapshot of EBS to s3
§ Data tiering FS to s3
§ AutoScaling allows for
scaling when needed
master instance
$ qsub job.sh
EBS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
POC using EFS and EBS
availability zone
S3
EFS
1. Copy Data from S3 to EBS on startup
2. Start job
3. Use EBS vols while job is running
4. Access mounted EFS directories while job is
running (/lib and /binary)
5. Record pass/fail
6. Update Data with delta
1
$ ./run_job
2
Mounted File Systems:
EBS
4
3
DynamoDB
5
6
r4.16xlarge
/scratch
/lib and /binary
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Visualization
availability zone
corporate data centermodeling
cluster
§ Using a GPU optimized
instance and AWS-NICE
DCV to visualized results
GPU instance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security Overview on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Shared Responsibility Model
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Compliance Programs
SOC 1
Global
SOC 2 SOC 3
https://aws.amazon.com/compliance/pci-data-privacy-protection-hipaa-soc-fedramp-faqs/
United
States
Asia
Pacific
Europe
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It is always YOUR data!
§ Customers choose where to place their data
§ AWS regions are geographically isolated by design
§ Data is not replicated to other AWS regions and does not
move unless the customer tell us to do so
§ Customer always own their data, the ability to encrypt it,
move it, and delete it
AWS Customer Agreement
https://aws.amazon.com/agreement/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ubiquitous, Fully-Managed Encryption
EBS
RDS
Amazon
Redshift
S3
Amazon
Glacier
Encrypted in transit
AWS CloudTrail
IAM
Fully auditable
Restricted access
and at rest
Fully managed
keys in KMS
Imported
keys
Your KMI
Amazon
EC2
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
From this
To This
Media Destruction
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cost Optimization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EC2 Purchasing Options
On-Demand
Pay for compute capacity by
the second with no long-
term commitments
Spiky workloads, to define
needs
Reserved
Make a 1 or 3 Year commitment
and receive a significant discount
off On-Demand prices
Committed, steady-state usage
Spot
Spare EC2 capacity at savings of
up to 90% off On-Demand prices
Fault-tolerant, dev/test, time-
flexible, stateless workloads
Per Second Billing for EC2 Linux instances & EBS volumes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cost Optimization
Weather Forecasting and Modeling
On Demand
Spot
Reserved
Instances
Forecasting
00z, 06z, 12z, 18z
Climate
Modeling
Weather
Events
Daily Forecasts
Climate
Modeling
Hurricane
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Options
• Spot fleet to maintain Instance
Availability
• Spot block durations (1-6
hours) for workloads that must
run continuously
Commitment level
• None
* Compared to On Demand price based on specific EC2 instance type, region and availability zone
Spot Instance details
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ec2FleetCompare - Spot example
$ ./ec2FleetCompare -n 20 -i c5.18xlarge
+--------+-------------+------+-----------+-------+------------+---------+---------+-------------------+-------------------+----------+----
| # INST | TYPE | VCPU | VCPU FREQ | MEM | NETWORK | IS TYPE | IS SIZE | DEMAND/HOUR | SPOT/HOUR | SPOT SAV | DEM
+--------+-------------+------+-----------+-------+------------+---------+---------+-------------------+-------------------+----------+----
| 20 | c5.18xlarge | 72 | 3.0 Ghz | 144.0 | 25 Gigabit | N/A | N/A | $61.20 ($3.06 ea) | $21.66 ($1.08 ea) | 65% | $44
+--------+-------------+------+-----------+-------+------------+---------+---------+-------------------+-------------------+----------+----
65% Savings!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“By using AWS Spot instances, we've been able to save 75% a month
simply by changing four lines of code. It makes perfect sense for
saving money when you're running continuous integration workloads or
pipeline processing.” - Matthew Leventi, Lead Engineer, Lyft
Why use Spot – customer examples
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
aws.amazon.com/compliance/data-center
AWS Data Centers
Take a virtual
tour of an AWS
data center
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you, and how can I help you run
HPC workloads on AWS?
aws.amazon.com/hpc

More Related Content

What's hot

Introduction to AWS IAM
Introduction to AWS IAMIntroduction to AWS IAM
Introduction to AWS IAMKnoldus Inc.
 
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Amazon Web Services
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon Web Services Korea
 
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저Amazon Web Services Korea
 
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...Amazon Web Services
 
A Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureA Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureAmazon Web Services
 
AWS CodeDeploy, AWS CodePipeline, and AWS CodeCommit: Transforming Software D...
AWS CodeDeploy, AWS CodePipeline, and AWS CodeCommit: Transforming Software D...AWS CodeDeploy, AWS CodePipeline, and AWS CodeCommit: Transforming Software D...
AWS CodeDeploy, AWS CodePipeline, and AWS CodeCommit: Transforming Software D...Amazon Web Services
 
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...Amazon Web Services Korea
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWSIan Massingham
 
Best Practices for CI/CD with AWS Lambda and Amazon API Gateway (SRV355-R1) -...
Best Practices for CI/CD with AWS Lambda and Amazon API Gateway (SRV355-R1) -...Best Practices for CI/CD with AWS Lambda and Amazon API Gateway (SRV355-R1) -...
Best Practices for CI/CD with AWS Lambda and Amazon API Gateway (SRV355-R1) -...Amazon Web Services
 
Arm 기반의 AWS Graviton 프로세서로 구동되는 AWS 인스턴스 살펴보기 - 김종선, AWS솔루션즈 아키텍트:: AWS Summi...
Arm 기반의 AWS Graviton 프로세서로 구동되는 AWS 인스턴스 살펴보기 - 김종선, AWS솔루션즈 아키텍트:: AWS Summi...Arm 기반의 AWS Graviton 프로세서로 구동되는 AWS 인스턴스 살펴보기 - 김종선, AWS솔루션즈 아키텍트:: AWS Summi...
Arm 기반의 AWS Graviton 프로세서로 구동되는 AWS 인스턴스 살펴보기 - 김종선, AWS솔루션즈 아키텍트:: AWS Summi...Amazon Web Services Korea
 
OpsNow를 활용한 AWS Cloud 비용 최적화 전략
OpsNow를 활용한 AWS Cloud 비용 최적화 전략OpsNow를 활용한 AWS Cloud 비용 최적화 전략
OpsNow를 활용한 AWS Cloud 비용 최적화 전략BESPIN GLOBAL
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheAmazon Web Services
 
AWS Control Tower
AWS Control TowerAWS Control Tower
AWS Control TowerCloudHesive
 
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)Amazon Web Services Korea
 
Architecting for High Availability
Architecting for High AvailabilityArchitecting for High Availability
Architecting for High AvailabilityAmazon Web Services
 

What's hot (20)

Introduction to AWS IAM
Introduction to AWS IAMIntroduction to AWS IAM
Introduction to AWS IAM
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
AWS 클라우드 비용 최적화를 위한 TIP - 임성은 AWS 매니저
 
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...
Amazon GuardDuty: Intelligent Threat Detection and Continuous Monitoring to P...
 
A Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureA Brief Look at Serverless Architecture
A Brief Look at Serverless Architecture
 
Introduction of AWS KMS
Introduction of AWS KMSIntroduction of AWS KMS
Introduction of AWS KMS
 
AWS CodeDeploy, AWS CodePipeline, and AWS CodeCommit: Transforming Software D...
AWS CodeDeploy, AWS CodePipeline, and AWS CodeCommit: Transforming Software D...AWS CodeDeploy, AWS CodePipeline, and AWS CodeCommit: Transforming Software D...
AWS CodeDeploy, AWS CodePipeline, and AWS CodeCommit: Transforming Software D...
 
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...
Amazon SageMaker 모델 빌딩 파이프라인 소개::이유동, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스...
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
 
Best Practices for CI/CD with AWS Lambda and Amazon API Gateway (SRV355-R1) -...
Best Practices for CI/CD with AWS Lambda and Amazon API Gateway (SRV355-R1) -...Best Practices for CI/CD with AWS Lambda and Amazon API Gateway (SRV355-R1) -...
Best Practices for CI/CD with AWS Lambda and Amazon API Gateway (SRV355-R1) -...
 
Arm 기반의 AWS Graviton 프로세서로 구동되는 AWS 인스턴스 살펴보기 - 김종선, AWS솔루션즈 아키텍트:: AWS Summi...
Arm 기반의 AWS Graviton 프로세서로 구동되는 AWS 인스턴스 살펴보기 - 김종선, AWS솔루션즈 아키텍트:: AWS Summi...Arm 기반의 AWS Graviton 프로세서로 구동되는 AWS 인스턴스 살펴보기 - 김종선, AWS솔루션즈 아키텍트:: AWS Summi...
Arm 기반의 AWS Graviton 프로세서로 구동되는 AWS 인스턴스 살펴보기 - 김종선, AWS솔루션즈 아키텍트:: AWS Summi...
 
Serverless computing
Serverless computingServerless computing
Serverless computing
 
OpsNow를 활용한 AWS Cloud 비용 최적화 전략
OpsNow를 활용한 AWS Cloud 비용 최적화 전략OpsNow를 활용한 AWS Cloud 비용 최적화 전략
OpsNow를 활용한 AWS Cloud 비용 최적화 전략
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCache
 
AWS Control Tower
AWS Control TowerAWS Control Tower
AWS Control Tower
 
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to Serverless
 
Architecting for High Availability
Architecting for High AvailabilityArchitecting for High Availability
Architecting for High Availability
 

Similar to AWS Compute Evolved Week: High Performance Computing on AWS

High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)Amazon Web Services
 
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAccelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAmazon Web Services
 
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...Amazon Web Services
 
High-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-SimulationHigh-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-SimulationAmazon Web Services
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...Amazon Web Services
 
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)Amazon Web Services
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSAmazon Web Services
 
Standard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud JourneyStandard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud JourneyAmazon Web Services
 
Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017Amazon Web Services
 
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Amazon Web Services
 
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...Amazon Web Services
 
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Amazon Web Services
 
Deployment of SAP Solutions on AWS (Level 200)
Deployment of SAP Solutions on AWS (Level 200)Deployment of SAP Solutions on AWS (Level 200)
Deployment of SAP Solutions on AWS (Level 200)Amazon Web Services
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services
 
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...Amazon Web Services
 
Deployment of SAP Solutions on AWS (Level 200)
Deployment of SAP Solutions on AWS (Level 200)Deployment of SAP Solutions on AWS (Level 200)
Deployment of SAP Solutions on AWS (Level 200)Amazon Web Services
 

Similar to AWS Compute Evolved Week: High Performance Computing on AWS (20)

High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)
成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)
 
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAccelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
 
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
 
High-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-SimulationHigh-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-Simulation
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
 
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWS
 
Standard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud JourneyStandard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud Journey
 
Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017Amazon EC2 Foundations - CMP203 - re:Invent 2017
Amazon EC2 Foundations - CMP203 - re:Invent 2017
 
What Can HPC on AWS Do?
What Can HPC on AWS Do?What Can HPC on AWS Do?
What Can HPC on AWS Do?
 
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
 
Deploying SAP Solutions on AWS
Deploying SAP Solutions on AWSDeploying SAP Solutions on AWS
Deploying SAP Solutions on AWS
 
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
 
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
 
Deployment of SAP Solutions on AWS (Level 200)
Deployment of SAP Solutions on AWS (Level 200)Deployment of SAP Solutions on AWS (Level 200)
Deployment of SAP Solutions on AWS (Level 200)
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
 
Deployment of SAP Solutions on AWS (Level 200)
Deployment of SAP Solutions on AWS (Level 200)Deployment of SAP Solutions on AWS (Level 200)
Deployment of SAP Solutions on AWS (Level 200)
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

AWS Compute Evolved Week: High Performance Computing on AWS

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pierre-Yves Aquilanti, Ph.D. – Senior HPC Specialized Solution Architect Anh Tran – Senior HPC Specialized Solution Architect Tuesday, August 7, 2018 High Performance Computing on AWS
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda § Overview of AWS Infrastructure § Why HPC on AWS § HPC Solution Components § Use Cases and Customer Stories § Security § Cost Optimization
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Global Infrastructure
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Over 100 Global CloudFront PoPs AWS Global Infrastructure Regions Amazon Global Network • Redundant 100GbE network • Redundant private capacity between all Regions except China
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Global Infrastructure 18 Regions – 54 Availability Zones – *114 Points of Presence Region & Number of Availability Zones US West EU Oregon (3) Ireland (3) Northern California (3) Frankfurt (3) London (3) US East Paris (3) N. Virginia (6), Ohio (3) Asia Pacific Canada Singapore (3) Central (2) Sydney (3), Tokyo (4), Seoul (2), Mumbai (2) GovCloud US-West (3) China South America Beijing (2) São Paulo (3) Ningxia (2) Announced Regions Bahrain, Hong Kong, SAR(China), GovCloud (US-East)*103 Edge Locations and 11 Regional Edge Caches
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why HPC on AWS?
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Running HPC Workloads Everyday § Logistics § Machine learning § Data Center, network, and server design § Consumer product design § Robotics § Semiconductor design § Retail and financial analytics
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tightly Coupled Parallel Computing Loosely Coupled Parallel Computing Accelerated Computing Visualization and Interpretation High Performance Data Storage and Analytics Scale EC2 Spot Pricing Early Access to Technology Choice Performance Derive unique insights with AI/ML Skip the Queue View results instantly AWS Advantages for HPC Workload Types
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why HPC on AWS - Multiple Clusters $ qsub –q monolith iwait.sh $ qsub dev.sh $ qsub prod.sh $ qsub critical.sh $ qsub bigrun.sh On-Prem Launch clusters by group, user, application – no more waiting!
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. M5 General purpose Compute optimized Storage and IO optimized GPU and FPGA accelerated Memory optimized X1 F1 P3 T2 I3 D2 R4 C5 C4 Shape the compute to match the work to be done P2 M4 Optimize application-specific infrastructure
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cost Advantages On Premises Capital Expense Model Amazon Web Services Pay As You Go Model § Use only what you need § Multiple pricing models § High upfront capital cost § High cost of ongoing support
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EC2 Purchasing Options On-Demand Pay for compute capacity by the second with no long- term commitments Spiky workloads, to define needs Reserved Make a 1 or 3 Year commitment and receive a significant discount off On-Demand prices Committed, steady-state usage Spot Spare EC2 capacity at savings of up to 90% off On-Demand prices Fault-tolerant, dev/test, time- flexible, stateless workloads Per Second Billing for EC2 Linux instances & EBS volumes
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building an HPC Infrastructure in AWS
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Understanding the Drivers What are the motivations to use Cloud computing? How running on AWS would be different from on-premises? What would you need to launch a PoC on AWS today? What are the requirements for your application? Do you need to visualize your data?
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC Solutions Storage EBS EFS S3 Networking Enhanced Networking Placement Groups Automation & Orchestration AWS Batch CfnCluster NICE EnginFrame Visualization NICE DCV Appstream 2.0 Compute EC2 Instance EC2 Spot Auto Scaling Accelerated Compute FPGA GPU
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 Instances General purpose Dense storage Compute optimized FPGA GPU Compute Storage optimized Graphics intensive Memory optimized High I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4 High I/O General purpose burstable Direct access to physical server resources Optimize the price/performance of your HPC Workloads with the widest range of compute instances C5DM5D R5 R5D N E W !N E W ! N E W ! N E W !
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EFS Amazon EBS Amazon EC2 Instance Store Amazon S3 / S3-IA Amazon Glacier Object Data Transfer AWS Direct Connect ISV Connectors Amazon Kinesis Firehose Storage Gateway S3 Transfer Acceleration AWS Storage is a Platform AWS Snowball Amazon CloudFront Internet/ VPN BlockFile
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Orchestration AWS Batch Managed AWS Lambda CfnCluster Un-Managed Traditional Scheduler AWS Step Functions Application Services Amazon SWF §Fully-managed services §Run large-scale compute workloads or simple functions §Focus on your jobs and their resources instead of the infrastructure §Quickly deploy a cluster using third-party schedulers §Bring your own scheduler or use AWS Marketplace solutions §Design and orchestrate workflows, with support for branching and callouts to other AWS services. §Easily integrated with AWS Batch, AWS Lambda…
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Network Performance AWS Proprietary Network, 10Gbps & 25Gbps § Highest performance in largest EC2 instance sizes § Full bi-section bandwidth in Placement Groups, with no network oversubscription Enhanced Networking § Over 1M PPS performance, reduced instance-to-instance latencies, more consistent network performance EC2 to S3 § Traffic to and from S3 can now take advantage of up to 25 Gbps of bandwidth
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Graphics and Collaboration with DCV and AppStream Pre-and post processing as well as HPC §Use GPUs in the cloud for remote rendering and remote desktops Collaborating Securily §Encrypt the data in flight and at rest §Manage your own keys and credentials §Deliver pixels to your collaborators, not the actual data
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deploying HPC on AWS 3D GRAPHICS VIRTUAL WORKSTATION LICENSE MANAGERS AND CLUSTER HEAD NODES WITH JOB SCHEDULERS CLOUD-BASED, AUTO-SCALING HPC CLUSTERS SHARED FILE STORAGE STORAGE CACHE Amazon S3 and Amazon Glacier ON-PREMISES HPC RESOURCES Corporate Datacenter AWS SNOWBALL AWS DIRECT CONNECT THIN - NO LOCAL DATA - OR ZERO CLIENT APPSTREAM 2.0 AWS BATCH On AWS, secure and well-optimized HPC clusters can be automatically created, operated, and torn down in just minutes
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Launching new instances and running tests in parallel is easy…[when choosing an instance] there is no substitute for measuring the performance of your full application.” —EC2 documentation
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customer Use Cases
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Several Kinds of HPC Workloads Data Light Minimal requirements for high performance storage Data Heavy Benefits from access to high performance storage Clustered (Tightly coupled) Distributed / Grid (Loosely coupled) • Fluid dynamics • Weather forecasting • Materials simulations • Crash simulations • Risk simulations • Molecular modeling • Contextual search • Logistics simulations • Animation and VFX • Semiconductor verification • Image processing/GIS • Genomics • Seismic processing • Metagenomics • Astrophysics • Deep learning
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC Grids in Financial Services “Using AWS helps us reduce a 10-day process to 10 minutes. That’s transformative: it broadens our ability to discover.” Peter Phillips Managing Director Aon Benfield Securities Using GPU acceleration The Challenge § Spinning up up large numbers of GPUs quickly and inexpensively to meet ABSI’s customers financial modeling & reporting needs § ABSI uses proprietary algorithms (Monte Carlo simulations) running millions of times The Solution § ABSI moved its infrastructure to AWS and deprecated its co-located data center § ABSI built a front-end on AWS for its processing solution, automatically running GPU instances on Amazon EC2 using EBS in an Amazon VPC for security The Result § Can be as much as 500 times more efficient in terms of performance per dollar for some clients
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC clusters in Healthcare & LifeSciences “By spinning up a few hundred nodes on AWS and getting results in less than a day, our scientific researchers have a lot more freedom to ask questions that weren’t even possible before. The speed is important, but equally important is the additional intellectual curiosity this enables for researchers” Lance Smith Associate Director of IT, Celgene HPC on AWS for Cancer Drug Research The Challenge § Slower time to results due to wait times and longer times to run jobs on fixed configurations available § Hard to collaborate with external entities due to security and compliance issues § Inability to scale beyond the fixed number of cores that were available on premises The Solution § The company runs many HPC workloads on hundreds of Amazon EC2 instances and uses Amazon S3 and Amazon Glacier to store hundreds of terabytes of genomic data § Using Amazon VPC, AWS Access and Identity Management, AWS Direct Connect to collaborate securely The Result § HPC job time reduced to hours instead of weeks § More parallel work being achieved leading to increased productivity
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC in Design & Engineering Boom leverages Rescale and AWS to enable supersonic travel “Rescale’s ScaleX cloud platform is a game-changer for engineering. It gives Boom computing resources comparable to building a large on-premise HPC center. Rescale lets us move fast with minimal capital spending and resources overhead.” Josh Krall CTO & Co-Founder § Simulated vortex lift with 200M cell models on 512+ cores § Increased simulation throughput: 100 jobs in parallel with 6x speedup per job → 600x speedup § Eliminated IT overhead, including server capital costs & in-house IT and software costs § Elastic HPC capacity and pay-as-you-go AWS clusters allow business agility & ability to scale
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1.1M vCPUs for machine learning A group of researchers from Clemson University achieved a remarkable milestone while studying topic modeling, an important component of machine learning associated with natural language processing, breaking the record for creating the largest high- performance cluster in the cloud by using more than 1,100,000 vCPUs on Amazon EC2 Spot Instances running in a single AWS region. The graph highlights the elastic, automatic expansion of resources. Clemson took advantage of the new per-second billing for EC2 instances. The vCPU count usage is comparable to the core count on the largest supercomputers in the world. S3 Provisionin g and workflow automation software S3 JOB SCRIPT CLOUDY CLUSTER APIs LOGIN SCHEDULER SLURM AUTO SCALING SPOT FLEET CCQ S3 DDB VPC https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million-vcpus-ec2-spot-instances/
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you, and how can I help you run HPC workloads on AWS? aws.amazon.com/hpc
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pierre-Yves Aquilanti, Ph.D. – Senior HPC Specialized Solution Architect Anh Tran – Senior HPC Specialized Solution Architect Tuesday, August 7, 2018 HPC on AWS Deep Dive
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. § First launched in August 2006 § M1 instance § “One size fits all” M1 In the past
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 Instances General purpose Dense storage Compute optimized FPGA GPU Compute Storage optimized Graphics intensive Memory optimized High I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4 High I/O General purpose burstable Direct access to physical server resources Optimize the price/performance of your HPC Workloads with the widest range of compute instances C5DM5D R5 R5D N E W !N E W ! N E W ! N E W !
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance generation c5.9large Instance family Instance size Elastic Compute Cloud (EC2) Instance Naming
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance sizing c5.18xlarge 2 x c5.9xlarge ≈ 4 x c5.4xlarge ≈ 8 x c5.2xlarge ≈
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hypervisor Management, Security, and Monitoring Storage Customer Instances Network Original EC2 Host Architecture § All resources were on the server § Instance Goals: • Security • Performance • Familiarity SERVER
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hypervisor Management, Security, and Monitoring Storage Customer Instances Network EC2 C5 Instance § Nearly 100% of available compute resources available to customers’ workload § Improved security SERVER NITRO
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. C5 Instances - Intel® XEON® Scalable Processor § Intel Skylake @ 3.0 GHz (turbo to 3.5GHz) § Supports AVX512 § C-state controls § Nitro System, a combination of dedicated hardware and lightweight hypervisor § Up to 25 Gbps network AVX 512 72 vCPUs “Skylake” 144 GiB memory C5 12 Gbps to EBS 2X vCPUs 3X throughput 2.4X memory C4 36 vCPUs “Haswell” 4 Gbps to EBS 60 GiB memory
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Performance Considerations Test using real-world examples § Use large cases for testing: do not benchmark scalability using only small examples MPI libraries § Test with Intel MPI and OpenMPI 3.0, and make use of available tunings Domain decomposition § Choose number of cells per core for either per-core efficiency or for faster results Network § Use a placement group § Enable enhanced networking
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application Compilation and Run § Can your application use “hybrid mode”? § Mix of MPI and OpenMP § WRF is a great example § Are you using the Intel compiler (AWS has Intel procs)? § Compile § Runtime Consider installing the Intel compiler on AWS: https://software.intel.com/en-us/articles/install-intel-parallel-studio-xe-on- amazon-web-services-aws
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s a Virtual CPU? (vCPU) § A vCPU is typically an Intel hyper-threaded physical core* § On Linux, “A” threads enumerated before “B” threads § On Windows, threads are interleaved § Divide vCPU count by 2 to get core count § Cores by EC2 & RDS DB Instance type: https://aws.amazon.com/ec2/virtualcores/ * The “T” family is special
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Disable Hyper-Threading (on the OS) § Useful for CPU heavy applications § Use ‘lscpu’ to validate layout § Disable Hyper-Threading without reboot § Set grub to only initialize the first half of all threads for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' 'n' | sort -un); do echo 0 | sudo tee /sys/devices/system/cpu/cpu${cpunum}/online done maxcpus=64
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Timekeeping Explained § Timekeeping in an instance is deceptively hard § gettimeofday(), clock_gettime(), QueryPerformanceCounter() § The TSC § CPU counter, accessible from userspace § Requires calibration, vDSO § Invariant on Sandy Bridge+ processors § Xen pvclock; does not support vDSO § On current generation instances, use TSC as clocksource
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Change with: Use TSC as clocksource Check with: Change at boot time with (e.g. /etc/default/grub):
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Transfer HPC Data Flow on AWS Storage corporate data center Amazon Glacier Amazon S3 AWS Direct Connect ISV Connectors Storage Gateway AWS Snowball Internet/VPN Ingress Egress Lifecycle EC2 Instance EBS Instance Store Object, Block, File Storage Amazon Kinesis Firehose S3 Transfer Acceleration Amazon CloudFront Other Shared File System EFS 25 Gbps to S3
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Instance Store Temporary block-level storage Physically attached to host computer Lifetime • Data lost when: • drive failure • instance stops • instance terminates • Data persists on reboot Instance store data loss prevention: • Create RAID 1/5/6 • Move data to S3 or EBS • Create a fault tolerant FS XX
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EBS Volume Types General Purpose SSD balance price and performance for a wide variety of transactional data gp2 Provisioned IOPS SSD latency-sensitive transactional workloads io1 Throughput Optimized HDD frequently accessed, throughput intensive workloads st1 Cold HDD less frequently accessed data sc1 SSD HDD
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EBS – Elastic Block Storage Two Block Storage options for EC2 Instances: EBS and Instance Store EC2 Instance /dev/xvda /dev/xvdb /dev/xvdc Block Device Mapping Instance Store ephemeral0 ephemeral1 vol-xxxxxxxx vol-xxxxxxxx /dev/xvdd EBS Volumes
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EBS Performance http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html - As of July 25, 2017 Instance type EBS- optimized by default Max EBS bandwidth (Mbps)* Expected throughput (MB/s)** Max. IOPS (16 KB I/O size)** Max Network bandwidth 3 Year Reserved $/Hour (N. Virginia) r4.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $1.600 m4.16xlarge Yes 10,000 1,250 65,000 25 Gb/s $1.203 c5.19xlarge Yes 9,000 1,125 64,000 25Gb/s $1.928 g3.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $2.023 i3.16xlarge Yes 14,000 1,750 65,000 25 Gb/s $2.112 f1.16xlarge Yes 14,000 1,750 75,000 25 Gb/s $5.734 p2.16xlarge Yes 10,000 1,250 65,000 25 Gb/s $6.392 x1.32xlarge Yes 10,000 1,250 65,000 10 Gb/s $3.732 Choose the right instance: RAID multiple EBS volumes together to achieve max performance
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage Classes and Tiering on Amazon S3 Standard • Primary data • Big Data Analytics • Small objects • Temporary scratch space • Archive data • Deep/offline archives • Tape vaulting replacement • WORM-compliant data • File sync and share • Active Archive • Enterprise backup • Media transcoding • Geo-redundancy/DR Standard - Infrequent Access Amazon Glacier
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. File Systems on AWS § EFS – Elastic File System § Distributed across multiple AZs § Petabyte-scale § Easy to bring up, no management § Build your own NFS § Use for a POC § Ephemeral data (i3.*) § Parallel file systems § Build your own or use APN solutions
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Automation and Batch Processing
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Traditional Job Schedulers Integrate Easily Bring your scheduler to AWS, or build your own § IBM Platform LSF § Univa Grid Engine § Altair PBS Pro § SLURM § Design your own using AWS services § Do you actually need a scheduler?
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Batch § AWS Batch dynamically provisions resources § Plans, schedules, and executes § No batch software to install Focus on your applications and results!
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC Automation and Orchestration Choose from several options to adapt your workloads § CfnCluster § AWS Batch § AWS-NICE DCV and EnginFrame § Build your own CloudFormation templates § ISV offerings on Marketplace or use an SI
  • 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Launch a Cluster in minutes § Cluster creation usually takes ~15 minutes § Completely managed by CloudFormation $ cfncluster create mycluster
  • 57. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CfnCluster Configuration Options § Operating System § Amazon Linux § Centos 6 § Centos 7 § Ubuntu 14.04 § Scheduler § Sun Grid Engine (SGE) § PBS/Torque § SLURM § Storage Size & IOPS § EBS & Instance Store Encryption § Scaling Speed & Limits § Provisioning Scripts
  • 58. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. License Server Management § FLEXlm works natively on AWS § Each EC2 instance has a unique hostname & hardware address that can’t be spoofed § Set the ENI (Network Interface) for your license server not to “Delete on termination” § Allows for simple license failover and migration
  • 59. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HPC Architecture on AWS corporate data center availability zone autoscaling group parallel FS local NFS s3 data ingress/egress EFS § Three file systems: EFS, Local NFS, and Parallel FS § Snapshot of EBS to s3 § Data tiering FS to s3 § AutoScaling allows for scaling when needed master instance $ qsub job.sh EBS
  • 60. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. POC using EFS and EBS availability zone S3 EFS 1. Copy Data from S3 to EBS on startup 2. Start job 3. Use EBS vols while job is running 4. Access mounted EFS directories while job is running (/lib and /binary) 5. Record pass/fail 6. Update Data with delta 1 $ ./run_job 2 Mounted File Systems: EBS 4 3 DynamoDB 5 6 r4.16xlarge /scratch /lib and /binary
  • 61. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Visualization availability zone corporate data centermodeling cluster § Using a GPU optimized instance and AWS-NICE DCV to visualized results GPU instance
  • 62. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Security Overview on AWS
  • 63. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Shared Responsibility Model
  • 64. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Compliance Programs SOC 1 Global SOC 2 SOC 3 https://aws.amazon.com/compliance/pci-data-privacy-protection-hipaa-soc-fedramp-faqs/ United States Asia Pacific Europe
  • 65. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. It is always YOUR data! § Customers choose where to place their data § AWS regions are geographically isolated by design § Data is not replicated to other AWS regions and does not move unless the customer tell us to do so § Customer always own their data, the ability to encrypt it, move it, and delete it AWS Customer Agreement https://aws.amazon.com/agreement/
  • 66. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ubiquitous, Fully-Managed Encryption EBS RDS Amazon Redshift S3 Amazon Glacier Encrypted in transit AWS CloudTrail IAM Fully auditable Restricted access and at rest Fully managed keys in KMS Imported keys Your KMI Amazon EC2
  • 67. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. From this To This Media Destruction
  • 68. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cost Optimization
  • 69. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EC2 Purchasing Options On-Demand Pay for compute capacity by the second with no long- term commitments Spiky workloads, to define needs Reserved Make a 1 or 3 Year commitment and receive a significant discount off On-Demand prices Committed, steady-state usage Spot Spare EC2 capacity at savings of up to 90% off On-Demand prices Fault-tolerant, dev/test, time- flexible, stateless workloads Per Second Billing for EC2 Linux instances & EBS volumes
  • 70. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cost Optimization Weather Forecasting and Modeling On Demand Spot Reserved Instances Forecasting 00z, 06z, 12z, 18z Climate Modeling Weather Events Daily Forecasts Climate Modeling Hurricane
  • 71. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Options • Spot fleet to maintain Instance Availability • Spot block durations (1-6 hours) for workloads that must run continuously Commitment level • None * Compared to On Demand price based on specific EC2 instance type, region and availability zone Spot Instance details
  • 72. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ec2FleetCompare - Spot example $ ./ec2FleetCompare -n 20 -i c5.18xlarge +--------+-------------+------+-----------+-------+------------+---------+---------+-------------------+-------------------+----------+---- | # INST | TYPE | VCPU | VCPU FREQ | MEM | NETWORK | IS TYPE | IS SIZE | DEMAND/HOUR | SPOT/HOUR | SPOT SAV | DEM +--------+-------------+------+-----------+-------+------------+---------+---------+-------------------+-------------------+----------+---- | 20 | c5.18xlarge | 72 | 3.0 Ghz | 144.0 | 25 Gigabit | N/A | N/A | $61.20 ($3.06 ea) | $21.66 ($1.08 ea) | 65% | $44 +--------+-------------+------+-----------+-------+------------+---------+---------+-------------------+-------------------+----------+---- 65% Savings!
  • 73. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “By using AWS Spot instances, we've been able to save 75% a month simply by changing four lines of code. It makes perfect sense for saving money when you're running continuous integration workloads or pipeline processing.” - Matthew Leventi, Lead Engineer, Lyft Why use Spot – customer examples
  • 74. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. aws.amazon.com/compliance/data-center AWS Data Centers Take a virtual tour of an AWS data center
  • 75. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 76. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you, and how can I help you run HPC workloads on AWS? aws.amazon.com/hpc