SlideShare a Scribd company logo
1 of 16
Download to read offline
Modest scale HPC on Azure
using CGYRO
Igor Sfiligoi –UC San Diego
Jeff Candy – General Atomics
A poster at
Instance Nodes Total time Comm. time Total cost
NDv2 24 161s 139s $5.24
NDv2 8 369s 316s $4.00
HBv2 36 272s 104s $1.81
HBv2 18 441s 190s $1.47
HC 35 416s 113s $2.42
HC 18 763s 151s $2.28
System Nodes Total time Comm. time
Summit 32 86s 67s
Cori 128 165s 62s
Cori 48 339s 160s
Azure
What is CGYRO?
• CGYRO is a premier tool for multi-scale plasma turbulence simulation
and has been in use by the fusion community for several years
• An Eulerian gyrokinetic solver and relies heavily on global FFT computations
• Fusion research still very active
• Several aspects of fusion energy physics are still not well understood
• Experimental methods are essential for gathering
new operational modes
• But simulations are used to validate basic theory,
plan experiments, interpret results on present devices,
and ultimately to design future devices.
Motivation for this exploratory work
• Leadership-class HPC centers are heavily sought for
and typically over-subscribed
• Can we do Fusion research in other venues?
• Commercial Clouds offer an appealing option
• They promise immediate access to resources
• All you need is $$$
• But can they deliver?
• Are there true HPC-class resources available?
• Can we afford it?
Microsoft Azure and HPC
• Among the commercial Cloud providers,
Microsoft Azure has the most HPC-like resources
• Several instance types with Infiniband connectivity
• The two most promising:
• NDv2 – 8x NVIDIA V100 GPUs with 100 Gbps EDR IB
• HBv2 – 120-core AMD EPYC CPUs with 200 Gbps HDR IB
Verifying IB performance
• CGYRO is extremely network latency and throughput sensitive
• One could say that it is network-bound
• Microsoft Azure IB shows great characteristics
using the OSU test benchmark tools
Measured network latencies in us, as reported by the osu_latency tool.
Submit environment
• Unlike HPC centers, Microsoft Azure does not provide
an HPC batch system or a shared file system out of the box
• But CycleCloud as a free add-on option
• CycleCloud provides several batch system options
• We chose SLURM, mostly due to our familiarity with that system
• Comes with ssh access and auto-scaling capabilities out-of-the-box
• More advanced options require the use of their API
• Initial setup relatively easy, but not trivial
• Mostly a documentation issue
• Hit also a couple of bugs in the advanced options (e.g. spot HPC use)
(since fixed)
Execution environment
• CycleCloud does most of the system/batch config
• Also comes with basic compiler and MPI config
• However, optimized for CPU instances
• No out-of-the-box GPU support
• To use the GPU instances we had to do some manual changes
• Create a GPU-enabled HPC VM Image, and point CycleCloud to it
• Install PGI compilers (note: Now called NVIDIA HPC SDK)
• Install and configure the MPI library
• Comes with PGI compilers
• We used the head-node NFS shared filesystem setup
• Not a true HPC storage solution, but good enough for CGYRO
Benchmarking CGYRO on Azure – With real science
• The main “benchmarking tool” was a brand new, cutting-edge
CGYRO simulation:
• A multi-scale simulation
• N_RADIAL=1024, N_TOROIDAL=128
• https://github.com/scidac/atom-open-doc/blob/master/2020.11-SC20/multiscale_input/input.cgyro
• Most of the compute time in Azure was spent
advancing the progress of the above simulation
• And most of that time was using Spot pricing (very little preemption incurred)
• We also ran some smaller test simulations for completeness
• nl03 and sh02, which represent more often used simulation profiles
• These benchmark tests used a minimal fraction of total resources
Multi-scale benchmark results on Azure
• We started with the GPU-providing NDv2 instance (8x NVIDIA V100)
• But observed that a very high fraction of the time spent in communication
• We thus switched the simulation to the CPU-providing instances
• HC uses “traditional” INTEL Xeon CPUs
• HBv2 uses the latest AMD EPYC CPUs
• Azure also has well defined per-hour price for each instance type,
making for an easy cost effectiveness comparison
• We focused on spot pricing
which seems feasible
at these scales
Instance Nodes Total time Comm. time Total cost
NDv2 24 161s 139s $5.24
NDv2 8 369s 316s $4.00
HBv2 36 272s 104s $1.81
HBv2 18 441s 190s $1.47
HC 35 416s 113s $2.42
HC 18 763s 151s $2.28
Slower per node, use more nodes
All numbers represent one typical step during the simulation. Cost is computed using spot instance pricing.
Multi-scale benchmark results on Azure
• We started with the GPU-providing NDv2 instance (8x NVIDIA V100)
• But observed that a very high fraction of the time spent in communication
• We thus switched the simulation to the CPU-providing instances
• HC uses “traditional” INTEL Xeon CPUs
• HBv2 uses the latest AMD EPYC CPUs
• Azure also has well defined per-hour price for each instance type,
making for an easy cost effectiveness comparison
• We focused on spot pricing
which seems feasible
at these scales
Instance Nodes Total time Comm. time Total cost
NDv2 24 161s 139s $5.24
NDv2 8 369s 316s $4.00
HBv2 36 272s 104s $1.81
HBv2 18 441s 190s $1.47
HC 35 416s 113s $2.42
HC 18 763s 151s $2.28
Slower per node, use more nodes
AMD CPU-based HBv2
a clear winner
Comparable speed to
NDv2, at much lower cost
All numbers represent one typical step during the simulation. Cost is computed using spot instance pricing.
Comparing to on-prem HPC centers(Multi-scale)
• To have a frame of reference, we also ran on
two on-prem HPC centers we had access to
• ORNL Summit – 6x NVIDIA V100 GPUs and 2x 100 Gbps IB per node
• NERSC Cori – INTEL Xeon Phi (KNL) CPU and 56 Gbps IB per node
• The Azure CPU instances
are comparable
to Cori results
• Summit is significantly faster
• Better networking shows
Instance Nodes Total time Comm. time Total cost
NDv2 24 161s 139s $5.24
NDv2 8 369s 316s $4.00
HBv2 36 272s 104s $1.81
HBv2 18 441s 190s $1.47
HC 35 416s 113s $2.42
HC 18 763s 151s $2.28
System Nodes Total time Comm. time
Summit 32 86s 67s
Cori 128 165s 62s
Cori 48 339s 160s
All numbers represent one typical step during the simulation. Cost is computed using spot instance pricing.
Smaller benchmark simulation – nl03
• Very similar insights when looking at the smaller nl03 test case
• AMD CPU based HBv2 still the most cost effective
• And an excellent
alternative to Cori
• NDv2 instances still
network limited
• Summit again
scales better
Instance Nodes Total time Comm. time Total cost
NDv2 16 121s 92s $2.64
NDv2 4 397s 293s $2.15
HBv2 36 87s 45s $0.58
HBv2 9 289s 64s $0.48
HC 24 223s 60s $0.89
HC 12 431s 96s $0.86
System Nodes Total time Comm. time
Summit 16 82s 46s
Cori 64 112s 46s
Cori 16 372s 120s
All numbers represent one typical step during the simulation. Cost is computed using spot instance pricing.
Suitability of spot pricing for HPC
• Spot instances definitely have downsides:
• lower availability and
• potential preemption during runtime
• But they typically cost 66% - 88% less than ”normal”, i.e. on-demand
• CGYRO can deal with occasional preemption
• Using checkpointing every couple hours, with minimal overhead
• At smaller node counts, we typically experienced at most
a couple preemptions per day
• But it does get worse with node count
• And were not able to reliably exceed 24x NDv2 or 36x HBv2
Summary and conclusions
• We explored the feasibility of running CGYRO on Azure HPC resources
• With an emphasis on using them in spot mode
• We observed both that
• CPU-only resources were very efficient, and
• that running in spot mode was doable, with minimal side effects.
• The GPU-enabled resources were less cost effective
but allowed for higher scaling
• When Cloud budget is available, Azure is an excellent place for CGYRO
Acknowledgements
• This presentation is based on the poster accepted at
and presented at SC20
https://sc20.supercomputing.org/presentation/?id=rpost106&sess=sess337
• The creation of this presentation was supported by the
U.S. Department of Energy under awards
DE-FG02-95ER54309 (General Atomics Theory grant) and
DE-SC0017992 (AToM SciDAC-4 project).
Computing resources were provided by the Oak Ridge Leadership Computing
Facility under Contract DE-AC05-00OR22725 (ALCC program) and the National
Energy Research Scientific Computing Center under Contract DE-AC02-05CH11231
Updates since the poster was created
• Microsoft Azure announced a few new improvements
• A new GPU-based HPC instance
• NDV4 – 8x NVIDIA A100 with 8x 200 Gbps IB (1.6 Tbps total) per node
• https://azure.microsoft.com/en-us/blog/bringing-ai-supercomputing-to-customers/
• An updated version of CycleCloud
• https://techcommunity.microsoft.com/t5/azure-compute/azure-cyclecloud-8-1-is-now-
available/ba-p/1898011

More Related Content

What's hot

OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula Project
 
Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...
Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...
Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...DevOps.com
 
Cloud Migration journey
Cloud Migration journeyCloud Migration journey
Cloud Migration journeyPaul Birkbeck
 
Kubernetes: Reducing Infrastructure Cost & Complexity
Kubernetes: Reducing Infrastructure Cost & ComplexityKubernetes: Reducing Infrastructure Cost & Complexity
Kubernetes: Reducing Infrastructure Cost & ComplexityDevOps.com
 
Cloud Platform Adoption: Lessons Learned
Cloud Platform Adoption: Lessons LearnedCloud Platform Adoption: Lessons Learned
Cloud Platform Adoption: Lessons LearnedVMware Tanzu
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsAltoros
 
AWS Customer Presenatation - SlingMedia uses AWS
AWS Customer Presenatation - SlingMedia uses AWSAWS Customer Presenatation - SlingMedia uses AWS
AWS Customer Presenatation - SlingMedia uses AWSAmazon Web Services
 
Cloud Migration Patterns: A Multi-Cloud Architectural Perspective
Cloud Migration Patterns: A Multi-Cloud Architectural PerspectiveCloud Migration Patterns: A Multi-Cloud Architectural Perspective
Cloud Migration Patterns: A Multi-Cloud Architectural PerspectivePooyan Jamshidi
 
How to Set Up ApsaraDB for RDS on Alibaba Cloud
How to Set Up ApsaraDB for RDS on Alibaba CloudHow to Set Up ApsaraDB for RDS on Alibaba Cloud
How to Set Up ApsaraDB for RDS on Alibaba CloudAlibaba Cloud
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Coburn Watson
 
AWS Webcast - How to Migrate On-premise NAS Storage to Cloud NAS Storage
AWS Webcast - How to Migrate On-premise NAS Storage to Cloud NAS StorageAWS Webcast - How to Migrate On-premise NAS Storage to Cloud NAS Storage
AWS Webcast - How to Migrate On-premise NAS Storage to Cloud NAS StorageAmazon Web Services
 
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
 How we Auto Scale applications based on CPU with Kubernetes at M6Web? How we Auto Scale applications based on CPU with Kubernetes at M6Web?
How we Auto Scale applications based on CPU with Kubernetes at M6Web?Vincent Gallissot
 
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Motoki Kakinuma
 
Presentation deploying cloud based services
Presentation   deploying cloud based servicesPresentation   deploying cloud based services
Presentation deploying cloud based servicesxKinAnx
 
goto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Checkgoto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in CheckCoburn Watson
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
Scylla Summit 2018: Discord: The Joy of Opinionated Systems
Scylla Summit 2018: Discord: The Joy of Opinionated SystemsScylla Summit 2018: Discord: The Joy of Opinionated Systems
Scylla Summit 2018: Discord: The Joy of Opinionated SystemsScyllaDB
 
When Disaster Strikes
When Disaster StrikesWhen Disaster Strikes
When Disaster StrikesAidan Finn
 
Cost Effective Rendering in the Cloud with Spot Instances
Cost Effective Rendering in the Cloud with Spot InstancesCost Effective Rendering in the Cloud with Spot Instances
Cost Effective Rendering in the Cloud with Spot InstancesAmazon Web Services
 
Hybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmapHybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmapJohn Georgiadis
 

What's hot (20)

OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebulaOpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
 
Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...
Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...
Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...
 
Cloud Migration journey
Cloud Migration journeyCloud Migration journey
Cloud Migration journey
 
Kubernetes: Reducing Infrastructure Cost & Complexity
Kubernetes: Reducing Infrastructure Cost & ComplexityKubernetes: Reducing Infrastructure Cost & Complexity
Kubernetes: Reducing Infrastructure Cost & Complexity
 
Cloud Platform Adoption: Lessons Learned
Cloud Platform Adoption: Lessons LearnedCloud Platform Adoption: Lessons Learned
Cloud Platform Adoption: Lessons Learned
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
AWS Customer Presenatation - SlingMedia uses AWS
AWS Customer Presenatation - SlingMedia uses AWSAWS Customer Presenatation - SlingMedia uses AWS
AWS Customer Presenatation - SlingMedia uses AWS
 
Cloud Migration Patterns: A Multi-Cloud Architectural Perspective
Cloud Migration Patterns: A Multi-Cloud Architectural PerspectiveCloud Migration Patterns: A Multi-Cloud Architectural Perspective
Cloud Migration Patterns: A Multi-Cloud Architectural Perspective
 
How to Set Up ApsaraDB for RDS on Alibaba Cloud
How to Set Up ApsaraDB for RDS on Alibaba CloudHow to Set Up ApsaraDB for RDS on Alibaba Cloud
How to Set Up ApsaraDB for RDS on Alibaba Cloud
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
 
AWS Webcast - How to Migrate On-premise NAS Storage to Cloud NAS Storage
AWS Webcast - How to Migrate On-premise NAS Storage to Cloud NAS StorageAWS Webcast - How to Migrate On-premise NAS Storage to Cloud NAS Storage
AWS Webcast - How to Migrate On-premise NAS Storage to Cloud NAS Storage
 
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
 How we Auto Scale applications based on CPU with Kubernetes at M6Web? How we Auto Scale applications based on CPU with Kubernetes at M6Web?
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
 
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
 
Presentation deploying cloud based services
Presentation   deploying cloud based servicesPresentation   deploying cloud based services
Presentation deploying cloud based services
 
goto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Checkgoto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Check
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Scylla Summit 2018: Discord: The Joy of Opinionated Systems
Scylla Summit 2018: Discord: The Joy of Opinionated SystemsScylla Summit 2018: Discord: The Joy of Opinionated Systems
Scylla Summit 2018: Discord: The Joy of Opinionated Systems
 
When Disaster Strikes
When Disaster StrikesWhen Disaster Strikes
When Disaster Strikes
 
Cost Effective Rendering in the Cloud with Spot Instances
Cost Effective Rendering in the Cloud with Spot InstancesCost Effective Rendering in the Cloud with Spot Instances
Cost Effective Rendering in the Cloud with Spot Instances
 
Hybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmapHybird Cloud - An adoption roadmap
Hybird Cloud - An adoption roadmap
 

Similar to Modest scale HPC on Azure using CGYRO

Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Amazon Web Services
 
Eco-friendly Linux kernel development
Eco-friendly Linux kernel developmentEco-friendly Linux kernel development
Eco-friendly Linux kernel developmentAndrea Righi
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersJoy Qiao
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
Include os @ flossuk 2018
Include os @ flossuk 2018Include os @ flossuk 2018
Include os @ flossuk 2018Per Buer
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerAkihiro Nomura
 
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...Tulipp. Eu
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...Amazon Web Services
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)Julien SIMON
 
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Zabbix
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...Igor Sfiligoi
 
Distributed Deep learning Training.
Distributed Deep learning Training.Distributed Deep learning Training.
Distributed Deep learning Training.Umang Sharma
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsDatabricks
 
Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
 

Similar to Modest scale HPC on Azure using CGYRO (20)

Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
 
Eco-friendly Linux kernel development
Eco-friendly Linux kernel developmentEco-friendly Linux kernel development
Eco-friendly Linux kernel development
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clusters
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
Include os @ flossuk 2018
Include os @ flossuk 2018Include os @ flossuk 2018
Include os @ flossuk 2018
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 Supercomputer
 
General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
 
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Distributed Deep learning Training.
Distributed Deep learning Training.Distributed Deep learning Training.
Distributed Deep learning Training.
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
 
Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 

More from Igor Sfiligoi

Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingIgor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateIgor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksIgor Sfiligoi
 
Bursting into the public Cloud - Sharing my experience doing it at large scal...
Bursting into the public Cloud - Sharing my experience doing it at large scal...Bursting into the public Cloud - Sharing my experience doing it at large scal...
Bursting into the public Cloud - Sharing my experience doing it at large scal...Igor Sfiligoi
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsIgor Sfiligoi
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi
 

More from Igor Sfiligoi (20)

Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 
Bursting into the public Cloud - Sharing my experience doing it at large scal...
Bursting into the public Cloud - Sharing my experience doing it at large scal...Bursting into the public Cloud - Sharing my experience doing it at large scal...
Bursting into the public Cloud - Sharing my experience doing it at large scal...
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Modest scale HPC on Azure using CGYRO

  • 1. Modest scale HPC on Azure using CGYRO Igor Sfiligoi –UC San Diego Jeff Candy – General Atomics A poster at Instance Nodes Total time Comm. time Total cost NDv2 24 161s 139s $5.24 NDv2 8 369s 316s $4.00 HBv2 36 272s 104s $1.81 HBv2 18 441s 190s $1.47 HC 35 416s 113s $2.42 HC 18 763s 151s $2.28 System Nodes Total time Comm. time Summit 32 86s 67s Cori 128 165s 62s Cori 48 339s 160s Azure
  • 2. What is CGYRO? • CGYRO is a premier tool for multi-scale plasma turbulence simulation and has been in use by the fusion community for several years • An Eulerian gyrokinetic solver and relies heavily on global FFT computations • Fusion research still very active • Several aspects of fusion energy physics are still not well understood • Experimental methods are essential for gathering new operational modes • But simulations are used to validate basic theory, plan experiments, interpret results on present devices, and ultimately to design future devices.
  • 3. Motivation for this exploratory work • Leadership-class HPC centers are heavily sought for and typically over-subscribed • Can we do Fusion research in other venues? • Commercial Clouds offer an appealing option • They promise immediate access to resources • All you need is $$$ • But can they deliver? • Are there true HPC-class resources available? • Can we afford it?
  • 4. Microsoft Azure and HPC • Among the commercial Cloud providers, Microsoft Azure has the most HPC-like resources • Several instance types with Infiniband connectivity • The two most promising: • NDv2 – 8x NVIDIA V100 GPUs with 100 Gbps EDR IB • HBv2 – 120-core AMD EPYC CPUs with 200 Gbps HDR IB
  • 5. Verifying IB performance • CGYRO is extremely network latency and throughput sensitive • One could say that it is network-bound • Microsoft Azure IB shows great characteristics using the OSU test benchmark tools Measured network latencies in us, as reported by the osu_latency tool.
  • 6. Submit environment • Unlike HPC centers, Microsoft Azure does not provide an HPC batch system or a shared file system out of the box • But CycleCloud as a free add-on option • CycleCloud provides several batch system options • We chose SLURM, mostly due to our familiarity with that system • Comes with ssh access and auto-scaling capabilities out-of-the-box • More advanced options require the use of their API • Initial setup relatively easy, but not trivial • Mostly a documentation issue • Hit also a couple of bugs in the advanced options (e.g. spot HPC use) (since fixed)
  • 7. Execution environment • CycleCloud does most of the system/batch config • Also comes with basic compiler and MPI config • However, optimized for CPU instances • No out-of-the-box GPU support • To use the GPU instances we had to do some manual changes • Create a GPU-enabled HPC VM Image, and point CycleCloud to it • Install PGI compilers (note: Now called NVIDIA HPC SDK) • Install and configure the MPI library • Comes with PGI compilers • We used the head-node NFS shared filesystem setup • Not a true HPC storage solution, but good enough for CGYRO
  • 8. Benchmarking CGYRO on Azure – With real science • The main “benchmarking tool” was a brand new, cutting-edge CGYRO simulation: • A multi-scale simulation • N_RADIAL=1024, N_TOROIDAL=128 • https://github.com/scidac/atom-open-doc/blob/master/2020.11-SC20/multiscale_input/input.cgyro • Most of the compute time in Azure was spent advancing the progress of the above simulation • And most of that time was using Spot pricing (very little preemption incurred) • We also ran some smaller test simulations for completeness • nl03 and sh02, which represent more often used simulation profiles • These benchmark tests used a minimal fraction of total resources
  • 9. Multi-scale benchmark results on Azure • We started with the GPU-providing NDv2 instance (8x NVIDIA V100) • But observed that a very high fraction of the time spent in communication • We thus switched the simulation to the CPU-providing instances • HC uses “traditional” INTEL Xeon CPUs • HBv2 uses the latest AMD EPYC CPUs • Azure also has well defined per-hour price for each instance type, making for an easy cost effectiveness comparison • We focused on spot pricing which seems feasible at these scales Instance Nodes Total time Comm. time Total cost NDv2 24 161s 139s $5.24 NDv2 8 369s 316s $4.00 HBv2 36 272s 104s $1.81 HBv2 18 441s 190s $1.47 HC 35 416s 113s $2.42 HC 18 763s 151s $2.28 Slower per node, use more nodes All numbers represent one typical step during the simulation. Cost is computed using spot instance pricing.
  • 10. Multi-scale benchmark results on Azure • We started with the GPU-providing NDv2 instance (8x NVIDIA V100) • But observed that a very high fraction of the time spent in communication • We thus switched the simulation to the CPU-providing instances • HC uses “traditional” INTEL Xeon CPUs • HBv2 uses the latest AMD EPYC CPUs • Azure also has well defined per-hour price for each instance type, making for an easy cost effectiveness comparison • We focused on spot pricing which seems feasible at these scales Instance Nodes Total time Comm. time Total cost NDv2 24 161s 139s $5.24 NDv2 8 369s 316s $4.00 HBv2 36 272s 104s $1.81 HBv2 18 441s 190s $1.47 HC 35 416s 113s $2.42 HC 18 763s 151s $2.28 Slower per node, use more nodes AMD CPU-based HBv2 a clear winner Comparable speed to NDv2, at much lower cost All numbers represent one typical step during the simulation. Cost is computed using spot instance pricing.
  • 11. Comparing to on-prem HPC centers(Multi-scale) • To have a frame of reference, we also ran on two on-prem HPC centers we had access to • ORNL Summit – 6x NVIDIA V100 GPUs and 2x 100 Gbps IB per node • NERSC Cori – INTEL Xeon Phi (KNL) CPU and 56 Gbps IB per node • The Azure CPU instances are comparable to Cori results • Summit is significantly faster • Better networking shows Instance Nodes Total time Comm. time Total cost NDv2 24 161s 139s $5.24 NDv2 8 369s 316s $4.00 HBv2 36 272s 104s $1.81 HBv2 18 441s 190s $1.47 HC 35 416s 113s $2.42 HC 18 763s 151s $2.28 System Nodes Total time Comm. time Summit 32 86s 67s Cori 128 165s 62s Cori 48 339s 160s All numbers represent one typical step during the simulation. Cost is computed using spot instance pricing.
  • 12. Smaller benchmark simulation – nl03 • Very similar insights when looking at the smaller nl03 test case • AMD CPU based HBv2 still the most cost effective • And an excellent alternative to Cori • NDv2 instances still network limited • Summit again scales better Instance Nodes Total time Comm. time Total cost NDv2 16 121s 92s $2.64 NDv2 4 397s 293s $2.15 HBv2 36 87s 45s $0.58 HBv2 9 289s 64s $0.48 HC 24 223s 60s $0.89 HC 12 431s 96s $0.86 System Nodes Total time Comm. time Summit 16 82s 46s Cori 64 112s 46s Cori 16 372s 120s All numbers represent one typical step during the simulation. Cost is computed using spot instance pricing.
  • 13. Suitability of spot pricing for HPC • Spot instances definitely have downsides: • lower availability and • potential preemption during runtime • But they typically cost 66% - 88% less than ”normal”, i.e. on-demand • CGYRO can deal with occasional preemption • Using checkpointing every couple hours, with minimal overhead • At smaller node counts, we typically experienced at most a couple preemptions per day • But it does get worse with node count • And were not able to reliably exceed 24x NDv2 or 36x HBv2
  • 14. Summary and conclusions • We explored the feasibility of running CGYRO on Azure HPC resources • With an emphasis on using them in spot mode • We observed both that • CPU-only resources were very efficient, and • that running in spot mode was doable, with minimal side effects. • The GPU-enabled resources were less cost effective but allowed for higher scaling • When Cloud budget is available, Azure is an excellent place for CGYRO
  • 15. Acknowledgements • This presentation is based on the poster accepted at and presented at SC20 https://sc20.supercomputing.org/presentation/?id=rpost106&sess=sess337 • The creation of this presentation was supported by the U.S. Department of Energy under awards DE-FG02-95ER54309 (General Atomics Theory grant) and DE-SC0017992 (AToM SciDAC-4 project). Computing resources were provided by the Oak Ridge Leadership Computing Facility under Contract DE-AC05-00OR22725 (ALCC program) and the National Energy Research Scientific Computing Center under Contract DE-AC02-05CH11231
  • 16. Updates since the poster was created • Microsoft Azure announced a few new improvements • A new GPU-based HPC instance • NDV4 – 8x NVIDIA A100 with 8x 200 Gbps IB (1.6 Tbps total) per node • https://azure.microsoft.com/en-us/blog/bringing-ai-supercomputing-to-customers/ • An updated version of CycleCloud • https://techcommunity.microsoft.com/t5/azure-compute/azure-cyclecloud-8-1-is-now- available/ba-p/1898011