Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing

Demonstrating a Pre-Exascale,
Cost-Effective Multi-Cloud Environment
for Scientific Computing
Igor Sfiligoi – UC San Diego
together with
Frank Würthwein – UC San Diego
Steve Barnet, Vladimir Brik, Benedikt Riedel & David Schultz – UW Madison

How we extended
the IceCube computing environment
to Pre-Exascale size
using “cheap” Cloud resources
Igor Sfiligoi – UC San Diego
together with
Frank Würthwein – UC San Diego
Steve Barnet, Vladimir Brik, Benedikt Riedel & David Schultz – UW Madison

The big picture idea
• Demonstrate the viability of a Pre-Exascale High Throughput Computing
pool using GPU instances from all major Cloud providers
• Without using any reservations or long-term commitments,
and using only ”cheap” instances
• Sustain the pool for a whole workday
• Running a real scientific application – IceCube simulation
• We went and did it
• Reached about 170 PFLOP32s
• Integrating an EFLOP32 hour
• At a $60k price tag

How does one detect neutrinos?
• Cherenkov light - Sonic boom with light
• Cherenkov light appears when a
charged particle travels through matter
faster than light can
Ice happens to be
a great medium
• Solid
• Yet transparent
Classic example from nuclear reactor

• Using natural ice has its disadvantages
• Ice optical properties not homogeneous
(deposited over millennia)
Aya Ishihara
7

• Detector calibration via ray-tracing simulation
• Several ExaFLOP months of compute needed
• GPUs very effective for computing
photon propagation (ray-tracing)
• Orders of magnitude more effective than CPUs
• OpenCL makes it easy to use any GPU type
• Intrinsically an HTC problem
(High Throughput Computing)
• Each photon independent (pleasantly parallel)
• A statistical problem
Science and compute meet
Optical Prop
Aya Ishihara
Combining all the possible inform
These features are included in s
We re al a s be de eloping he
Nature never tell us a perfect a
satisfactory agreeme

IceCube is an OSG user
• IceCube already doing HTC at scale through the Open Science Grid (OSG)
• Which is based on HTCondor as its Workload Management System (WMS)
• HTCondor makes is trivial to join
any kind of compute resource into a single HTC pool
• No special network requirements, no shared anything
• Cloud resources one of the easiest to add
• The desired scale was beyond what they normally use
• Just added some additional hardware to handle the additional load
• All jobs could still run in either on-prem (OSG, XSEDE or PRP/k8s)
or Cloud resources
10

AWS region map
MS Azure region map
Provisioning using native Cloud APIs
• Native Cloud APIs easy to use
• But be aware that each Cloud region completely independent
• Using group provisioning mechanisms in all three
• AWS – Spot Fleet
• Azure – Virtual Machine Scale Set (VMSS)
• Google – Instance Groups
• VM Image
• OS + HTCondor + CVMFS for software distribution
• Tailored for Cloud provider
• To use appropriate tools and drivers
• Parametrized to deal with region differences
(based on Cloud Metadata catalog info fetched during boot)

Integrated an EFLOP hour
• Executed on a random Tuesday in February 2020
• Provisioning from all over the world (from AWS+Azure+Google Cloud)
• Reached about 170 fp32 PFLOPS
• Sustained for a whole workday (6h at plateau)
• Happened to integrate just about one fp32 EFLOP hour

Spending about $60k
Negligible egress costs
during post-run data export
Comparable to TACC’s Frontera
1/3rd
of OLCF’s Summit
• Requested only spot/preemptible instances( ~1/3rd the cost of on-demand)
• T4s the most cost effective – 40 PFLOPS at ~$1k/hour
• Used also P40 and V100 to reach 170 PFLOPS at ~$10k/hour
• No network related costs
• Only incoming traffic during the Cloud Burst

Minimal impact of preemption
• Preemption only a minor nuisance
• Incurred less than 10% waste
• Great deal for a 70% discount
• Recovery completely transparent to users
• IceCube jobs ideal for such environment
• Job runtimes in the 20-60 minutes range

Just-in-time input fetching
• Each IceCube job requires a unique input file
• Created asynchronously ahead of time, stored in IceCube@UW storage
• About 45 MB in size
• Job wrapper pulls the appropriate file as first step
• We used standard HTTP fetch using aria2
• Photon propagation code reads it from local disk
• Waste due to file transfer minimal
• A few seconds out of O(1k) job runtime
• Even for remote resources

Plenty of network capacity
• In preparation for the actual burst
demonstrated close to 100 Gbps
• See poster 197 for more details
“Demonstrating 100 Gbps in and out of the public Clouds”
• IceCube Cloud Burst required
only about 4 Gbps on average
WAN bandwidth
• Short peaks of up to 6 Gbps
ThroughputinBytespersecond

The Cloud is an option
• We showed that there is plenty of capacity in the Cloud
(at least, that was the case in February 2020)
• We reached and sustained about 170 fp32 PFLOPS for a whole workday
• It is relatively easy to expand an HTC system into the Clouds
• Without any advanced planning or long-term commitments
• At a reasonable cost ($30k-$60k for an integrated fp32 EFLOP hour)
• While still running real scientific computing on it
• Nice experience, we are confident we could do this again anytime
• If interested, talk to me or OSG
19

This work was partially funded by
the US National Science Foundation (NSF)
under grants OAC-1941481, MPS-1148698,
OAC-1841530, OAC-1826967,
OAC-1904444 and OPP-1600823.
Acknowledgments

Integrated an EFLOP hour
• Executed on a random Tuesday in February 2020
• Provisioning from all over the world (from AWS+Azure+Google Cloud)
• Sustained for a whole workday (6h at plateau)
• Reached and sustained about 170 fp32 PFLOPS
• Happened to integrate just about one fp32 EFLOP hour
Quick note:
This was our 2nd Cloud Burst with IceCube
The 1st one was executed
in November 2019
and was aimed at showing
peak capacity (not sustained one)
and without budget considerations
More details in my ISC20 talk
https://youtu.be/VNhQxIVJOXw
380 fp32 PFLOPS at peak
About 2 hours total

FLOPS a good metric
• We talked about PFLOPS because it is easy to measure
• # GPUs * NVIDIA-provided specs
• But science output is what matters
• Turns out # jobs and # FLOPS correlate very well

Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing

Similar to Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing (20)

More from Igor Sfiligoi

More from Igor Sfiligoi (20)

Recently uploaded

Recently uploaded (20)

Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing