SlideShare a Scribd company logo
1 of 29
Download to read offline
P R E S E N T E D B Y
Sandia National Laboratories is a
multimission laboratory managed and
operated by National Technology &
Engineering Solutions of Sandia, LLC, a
wholly owned subsidiary of Honeywell
International Inc., for the U.S. Department
of Energy’s National Nuclear Security
Administration under contract DE-
NA0003525.
Vanguard Astra - Petascale ARM
Platform for U.S. DOE/ASC
Supercomputing
Andrew J. Younge
PIs: James H. Laros III, Kevin Pedretti, Si Hammond
Sandia National Laboratories
env
Outline
 Overview of Vanguard
 Prototype HPC Architectures
 Astra – Petascale ARM platform
 ATSE – Advanced Tri-lab Software Environment
 R&D Opportunities
 Conclusion
Vanguard Overview
Vanguard Program: Advanced Architecture
Prototype Systems4
• Prove viability of advanced technologies for NNSA integrated codes, at scale
• Expand the HPC-ecosystem by developing emerging unproven technologies
• Is it viable for future ATS/CTS platforms?
• Increase technology AND integrator choices
• Buy down risk and increase technology and vendor choices for future platforms
• Ability to accept higher risk allows for more/faster technology advancement
• Lowers/eliminates mission risk and significantly reduces investment
• Jointly address hardware and software challenges
• First prototype platform targeting ARM
Where Vanguard Fits5
V a n g u a r dT e s t B e d s A T S /C T S P la tfo r m s
H ig h e r R is k , G r e a te r A r c h ite c tu r a l C h o ic e s
G r e a te r S ta b ility , L a r g e r S c a le
Test Beds
• Small testbeds
(~10-100 nodes)
• Breadth of
architectures
• Brave users
Vanguard
• Larger-scale
experimental systems
• Focused eforts to
mature new technologies
• Broader user-base
• Demonstrate viability
for production use
• NNSA Tri-lab resource
ATS/CTS Platforms
• Leadership-class systems
(Petascale, Exascale, ...)
• Advanced technologies,
sometimes frst-of-kind
• Broad user-base
• Production use
Vanguard Phase 1: Astra
Hammer
APM/HPE
Xgene-1
Sullivan
Cavium/Penguin
ThunderX1
= 2018
= Retired
= 2015
= 2017
TODAY
Mayer
Cavium
HPE/Comanche
ThunderX2
Future
ASC
Platforms
Sept 2011
Astra
Sandia’s NNSA/ASC ARM Platforms7
Petascale ARM
Platform
Delivery Aug/Sep
2018
HPE Apollo 70
Cavium ThunderX2
Mellanox ConnectX-
5
Switch-IB2
2592 nodes
Cavium
ThunderX
1
32 nodes
Pre-GA
Cavium
ThunderX2
47 nodes
Applied Micro
X-Gene-1
47 nodes
Vanguard
8
Astra
“Per aspera ad astra”
Demonstrate viability of ARM for U.S. DOE Supercomputnn
2.3 PFLOPs peak
885 TB/s memory bandwidth peak
332 TB memory
1.2 MW
per aspera ad astra
through difficulties to the stars
Vanguard-Astra Compute Node Building
Block9
Dual socket Cavium Thunder-X2
 CN99xx
 28 cores @ 2.0 GHz
8 DDR4 controllers per socket
One 8 GB DDR4-2666 dual-rank
DIMM per controller
Mellanox EDR InfiniBand
ConnectX-5 VPI OCP
Tri-Lab Operating System Stack
based on RedHat 7.5+
HPE Apollo 70
Cavium TX2 Node
Vanguard-Astra Compute Node10
Cavium Thunder-
X2
ARM v8.1
28 cores @ 2.0
GHz
Cavium Thunder-
X2
ARM v8.1
28 cores @ 2.0
GHz
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 GB DDR4-2666
DR
8 DDR4 channels/socket, 1 DIMM/channel
Each socket has its own PCIe x8 link to NIC
Mellanox ConnectX-
5 OCP Network
Interface
PCIe Gen3 PCIe Gen3
Management
Ethernet
1 Gbps
x8x8
1 EDR link, 100
Gbps1 Gbps
Vanguard-Astra System Packaging11
HPE Apollo 70 Chassis: 4 nodes
Astra
18 chassis/rack
72 nodes/rack
3 IB switches/rack
(one 36-port switch
per 6 chassis)
36 compute racks
(9 scalable units, each 4 racks)
2592 compute nodes
(5184 TX2 processors)
3 IB spine switches
(each 540-port)
HPE Apollo 70 Rack
Vanguard-Astra Infrastructure12
Login & Service Nodes 4 login/compilation nodes
3 Lustre routers to connect to external Sandia flesystem(s)
2 general service nodes
Interconnect EDR InfniBand in fat tree topology
2:1 oversubscribed for compute nodes
1:1 full bandwidth for in-platform Lustre storage
System Management Dual HA management nodes running
HPE Performance Software – Cluster Manager (HPCM)
Ethernet management network, connects to all nodes
One boot server per scalable unit (288 nodes)
In-platform Storage All-fash Lustre storage system
403 TB usable capacity
244 GB/s throughput
Network Topology13
540-Port Switch #2 540-Port Switch #3540-Port Switch #1
Switch 2.1 Switch 2.2 Switch 2.3 Switch 2.4 Switch 2.29 Switch 2.30
Switch 3.1 Switch 3.2 Switch 3.18...
... Switch 2.61 Switch 2.62 Switch 2.63 Switch 2.64 Switch 2.89 Switch 2.90
Switch 3.37 Switch 3.38 Switch 3.54...
...
Switch 1.1 Switch 1.2 Switch 1.3 Switch 1.4 Switch 1.5 Switch 1.6 Switch 1.7 Switch 1.8 ... Switch 1.105 Switch 1.06 Switch 1.107 Switch 1.108
24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes
Switch 2.31 Switch 2.32 Switch 2.33 Switch 2.34 Switch 2.59 Switch 2.60
Switch 3.19 Switch 3.20 Switch 3.36...
...
108 L1 switches * 24 nodes/switch = 2592 compute nodes
Mellanox Switch-IB2 EDR, Radix 36 switches, 3 level fat tree, 2:1 taper at L1
Each L1 switch has 4 links
to each 540-port switch
Vanguard-Astra Advanced Power &
Cooling14
18.5C
WB
20.0C
20.0C
1.5C approach
wall peak nominal (linpack) idle racks wall peak nominal (linpack) idle
Node racks 39888 35993 33805 6761 36 1436.0 1295.8 1217.0 243.4
MCS300 10500 7400 7400 170 12 126.0 88.8 88.8 2.0
Network 12624 10023 9021 9021 3 37.9 30.1 27.1 27.1
Storane 11520 10000 10000 1000 2 23.0 20.0 20.0 2.0
utlity 8640 5625 4500 450 1 8.6 5.6 4.5 0.5
1631.5 1440.3 1357.3 274.9
Projected power of the system by component
perconsttuent rack type (W) total (kW)
Extreme Efciency:
 Total 1.2 MW in the 36 compute racks are
cooled by only 12 fan coils
 These coils are cooled without
compressors year round. No evaporatve
water at all almost 6000 hours a year
 99% of the compute racks heat never
leaves the cabinet, yet the system
doesn’t require the internal plumbinn of
liquid disconnects and cold plates
runninn across all CPUs and DIMMs
 Builds on work by NREL and Sandia:
htps://www.nrel.nov/esif/partnerships-j
c.html
ATSE – Advanced Tri-lab Software
Environment
Advanced Tri-lab Software Environment Goals16
Build an open, modular, extensible, community-inspired, and
vendor-adaptable ecosystem
Prototype new technologies that may improve the DOE ASC
computing environment (e.g., ML frameworks, containers, VMs, etc)
Leverage existing efforts
 Tri-lab OS (TOSS)
 OpenHPC & other programming environments
 Exascale Computing Project (ECP) software technologies
Dec’17
ATSE
DesignDoc
Aug’17
Tri-labArmsoftware
teamformed
Jul’18
Initial
ReleaseTarget
Sep’18
FirstUseon
Vanguard-Astra
ATSE
stack
Tri-Lab Software Efort for ARM17
Accelerate ARM ecosystem for ASC computing
 Prove viability for ASC integrated codes running at scale
 Harden compilers, math libraries, tools, communication libraries
 Heavily templated C++, Fortran 2003/2008, Gigabyte+ binaries, long compiles
 Optimize performance, verify expected results
Build integrated software stack
 Programming environment (compilers, math libs, tools, MPI, OMP, SHMEM, I/O, ...)
 Low-level OS (optimized Linux, network, filesystems, containers/VMs, ...)
 Job scheduling and management (WLM, app launcher, user tools, ...)
 System management (boot, system monitoring, image management, ...)
Improve 0 to 60 time... ARM system arrival to useful work done
env
ARM Tri-lab Software Environment (ATSE)
Vanguard Hardware
Base OS
Layer
Closed Source Integrator ProvidedLimited Distribution ATSE Activity
Vendor OS TOSS
Open OS
e.g. OpenSUSE
Cluster Middleware
e.g. Lustre, SLURM
ATSE Programming Environment “Product” for Vanguard
Platform-optimized builds, common-look-and-feel across platforms
Virtual
Machines
ATSE Packaging
User-facing
Programming Env
Native
Installs
Containers
NNSA/ASC Application Portfolio
Open Source
Integrate Components from Many Sources
TOSS
RHEL
EPEL
Vendor
Sofware
ATSE
Packaner
OpenHPC
Open Build
Server
ATSE
Packanes
Vendor
Sofware
Koji Build
Server
ATSEDiagram(fromSNLFeb12TOSSmeeting)
VanguardHardware
Base OS
Layer
Closed Source Intenrator ProvidedLimited Distribution ATSE Activity
VendorOS TOSS
OpenOS
e.g.OpenSUSE
Cluster Middleware
e.n. Lustre, SLURM
ATSEProgrammingEnvironment“Product”forVanguard
Platform-optimized builds, common-look-and-feel across platforms
Virtual MachinesATSE Packaninn
User-facinn
Pronramminn Env
Native InstallsContainers
NNSA/ASCApplicationPortfolio
Open Source
ATSE Activity
Closed Source
Integrator Provided
Limited Distribution
Open Source
Key:
Draft ATSE Timeline for 2018
March
Continue software stack explorations and gap analysis on testbeds
Setup OpenBuild server and replicate OHPC package builds for aarch64
April – May
Develop ATSE Packager framework, ability to pull packages from TOSS, RHEL, OpenHPC OBS, vendor, and other
sources
Identify initial component list
July
Initial ATSE release 2018.0 on Mayer
Lab-distribution version: TOSS BaseOS + (ATSE-GCC | ATSE-ARM | ATSE-*)
 Open-distribution version: SUSE and/or CentOS BaseOS + ATSE-GCC
Q3 2018
 Linux kernel optimization and HPC patches
 Basic VM & container support
Q4 2018
 ATSE 2018.1 release
 Initial upstream to OpenHPC push
Astra Status and Research
Acceptance Plan – Maturing the
Stack22
env
Cavium Arm64 Providing Best-of-Class
Memory Bandwidth
23
STREAM TRIAD
TX2 DDR4-2400
SkyLake 8160
Trinity Haswell
Trinity KNL DDR
Network Bandwidth on ThunderX2 +
Mellanox MLX5 EDR with Socket Direct
24
Node 1
MLX5 EDR
MLX5_0 MLX5_3
Socket 1 Socket 2
Node 2
MLX5 EDR
MLX5_3MLX5_0
Socket 2Socket 1
Pair 1
Pair 1
Pair 2
Pair 2
1 Network Link
Socket Direct – Each socket
has dedicated path to the
NIC
OSU MPI Multi-
Network Bandwidth
Arm64 + EDR
providing
> 12 GB/s
between
nodes
> 75M
messages/se
c
Mini-App Performance on Cavium ThunderX225
ThunderX2 providing high
memory bandwidth
6 channels (Skylake) vs.
8 in ThunderX2
See this in MiniFE SpMV and
STREAM Triad
Slower compute reflects less
optimization in software stack
Examples – Non-SpMV kernels in
MiniFE and LULESH
GCC and ARM versus Intel
compiler MiniFE Solve GF/s MiniFE SpMV GF/s STREAM Triad LULESH Solve FOM
0
0.5
1
1.5
2
2.5
Speedup over Haswell E5-2680v3
ThunderX2 Skylake 8160 Haswell E5-2680
SpeedupoverHaswell
R&D Areas26
Leverage containers and virtual machines
 Support for machine learning frameworks
 ARMv8.1 includes new virtualization extensions, SR-IOV
 Working with Singularity on full container solution
Evaluating parallel filesystems + I/O systems @ scale
 GlusterFS, Ceph, BeeGFS, Sandia Data Warehouse, …
Resilience studies over Astra lifetime
Improved MPI thread support, matching acceleration
OS optimizations for HPC @ scale
 Exploring HPC-tuned Linux kernels to non-Linux lightweight kernels and multi-kernels
 Arm-specific optimizations
Conclusion27
Vanguard allows the DOE to take necessary risks to ensure a
healthy HPC ecosystem for future production mission platforms
Increase technology choices
Prove ability to run multi-physics production applications at scale
Tri-lab software stack effort to mature ARM for ASC computing
Harden compilers, math libs, and tools
Optimize performance, verify expected results
Increase modularity and openness of software stack
Support traditional HPC and emerging AI + ML workloads
Sandia now a member of Linaro HPC SIG!
Questions?
ajyoung@sandia.gov
Virt - From x86 to ARM64
▪ What opportunites and challennes do we face when
movinn from an x86 world to an ARM world?
▪ Virtual Machines
▪ Near-natve HPC performance with VMs possible in x86
– Type1 & Type2 hypervisors
– Hobbes / Palacios VM
▪ Avoid lenacy issues with x86?
▪ Anythinn new we can do with ARM?
▪ Containers
▪ How do I build containers on my x86 laptop that run on Astra?
▪ Focus on ABI compatbility
▪ Leverane industry/enterprise without losinn HPC focus
per aspera ad astra
through difficulties to the stars

More Related Content

What's hot

An Open Hardware CPU written in VHDL, synthesized with Open Source tools
An Open Hardware CPU written in VHDL, synthesized with Open Source toolsAn Open Hardware CPU written in VHDL, synthesized with Open Source tools
An Open Hardware CPU written in VHDL, synthesized with Open Source toolsGanesan Narayanasamy
 
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureBruno Cornec
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stackinside-BigData.com
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnectinside-BigData.com
 
Lenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
Lenovo HPC: Energy Efficiency and Water-Cool-Technology InnovationsLenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
Lenovo HPC: Energy Efficiency and Water-Cool-Technology Innovationsinside-BigData.com
 
Isn’t it Ironic that a Redfish is software defining you
Isn’t it Ironic that a Redfish is software defining you Isn’t it Ironic that a Redfish is software defining you
Isn’t it Ironic that a Redfish is software defining you Bruno Cornec
 
OpenPOWER foundation update new executive director and bright open future_i...
OpenPOWER  foundation update  new executive director and bright open future_i...OpenPOWER  foundation update  new executive director and bright open future_i...
OpenPOWER foundation update new executive director and bright open future_i...Ganesan Narayanasamy
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishBruno Cornec
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64Linaro
 
Redfish & python redfish
Redfish & python redfishRedfish & python redfish
Redfish & python redfishRené Ribaud
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...Linaro
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...Edge AI and Vision Alliance
 
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)Drew Fustini
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopLinaro
 

What's hot (20)

An Open Hardware CPU written in VHDL, synthesized with Open Source tools
An Open Hardware CPU written in VHDL, synthesized with Open Source toolsAn Open Hardware CPU written in VHDL, synthesized with Open Source tools
An Open Hardware CPU written in VHDL, synthesized with Open Source tools
 
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Lenovo HPC Strategy Update
Lenovo HPC Strategy UpdateLenovo HPC Strategy Update
Lenovo HPC Strategy Update
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined Infrastructure
 
ARM HPC Ecosystem
ARM HPC EcosystemARM HPC Ecosystem
ARM HPC Ecosystem
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnect
 
Lenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
Lenovo HPC: Energy Efficiency and Water-Cool-Technology InnovationsLenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
Lenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
 
Isn’t it Ironic that a Redfish is software defining you
Isn’t it Ironic that a Redfish is software defining you Isn’t it Ironic that a Redfish is software defining you
Isn’t it Ironic that a Redfish is software defining you
 
OpenPOWER foundation update new executive director and bright open future_i...
OpenPOWER  foundation update  new executive director and bright open future_i...OpenPOWER  foundation update  new executive director and bright open future_i...
OpenPOWER foundation update new executive director and bright open future_i...
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
 
Redfish & python redfish
Redfish & python redfishRedfish & python redfish
Redfish & python redfish
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
 
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing Hadoop
 

Similar to Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing - Linaro Arm HPC Workshop

NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputinginside-BigData.com
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialmadhuinturi
 
WETEC HP Integrity Servers
WETEC HP Integrity ServersWETEC HP Integrity Servers
WETEC HP Integrity ServersEddy Jennekens
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Eric Van Hensbergen
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
Stellaris® 9000 Family of ARM® Cortex™-M3
Stellaris® 9000 Family of ARM® Cortex™-M3 Stellaris® 9000 Family of ARM® Cortex™-M3
Stellaris® 9000 Family of ARM® Cortex™-M3 Premier Farnell
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...NETWAYS
 
Ibm power sales bootcamp
Ibm power sales bootcampIbm power sales bootcamp
Ibm power sales bootcampsolarisyougood
 
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...Jürgen Ambrosi
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
TULIPP overview
TULIPP overviewTULIPP overview
TULIPP overviewTulipp. Eu
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationVEDLIoT Project
 
12.) fabric (your next data center)
12.) fabric (your next data center)12.) fabric (your next data center)
12.) fabric (your next data center)Jeff Green
 

Similar to Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing - Linaro Arm HPC Workshop (20)

NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
WETEC HP Integrity Servers
WETEC HP Integrity ServersWETEC HP Integrity Servers
WETEC HP Integrity Servers
 
Hp Integrity Servers
Hp Integrity ServersHp Integrity Servers
Hp Integrity Servers
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Stellaris® 9000 Family of ARM® Cortex™-M3
Stellaris® 9000 Family of ARM® Cortex™-M3 Stellaris® 9000 Family of ARM® Cortex™-M3
Stellaris® 9000 Family of ARM® Cortex™-M3
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
 
Ibm power sales bootcamp
Ibm power sales bootcampIbm power sales bootcamp
Ibm power sales bootcamp
 
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
TULIPP overview
TULIPP overviewTULIPP overview
TULIPP overview
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
12.) fabric (your next data center)
12.) fabric (your next data center)12.) fabric (your next data center)
12.) fabric (your next data center)
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Palestra IBM-Mack Zvm linux
Palestra  IBM-Mack Zvm linux  Palestra  IBM-Mack Zvm linux
Palestra IBM-Mack Zvm linux
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 

More from Linaro

Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramLinaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNLinaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...Linaro
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionLinaro
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersLinaro
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightLinaro
 
HKG18-TR12 - LAVA for LITE Platforms and Tests
HKG18-TR12 - LAVA for LITE Platforms and TestsHKG18-TR12 - LAVA for LITE Platforms and Tests
HKG18-TR12 - LAVA for LITE Platforms and TestsLinaro
 
HKG18-419 - OpenHPC on Ansible
HKG18-419 - OpenHPC on AnsibleHKG18-419 - OpenHPC on Ansible
HKG18-419 - OpenHPC on AnsibleLinaro
 

More from Linaro (19)

Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 
HKG18-TR12 - LAVA for LITE Platforms and Tests
HKG18-TR12 - LAVA for LITE Platforms and TestsHKG18-TR12 - LAVA for LITE Platforms and Tests
HKG18-TR12 - LAVA for LITE Platforms and Tests
 
HKG18-419 - OpenHPC on Ansible
HKG18-419 - OpenHPC on AnsibleHKG18-419 - OpenHPC on Ansible
HKG18-419 - OpenHPC on Ansible
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing - Linaro Arm HPC Workshop

  • 1. P R E S E N T E D B Y Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE- NA0003525. Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing Andrew J. Younge PIs: James H. Laros III, Kevin Pedretti, Si Hammond Sandia National Laboratories env
  • 2. Outline  Overview of Vanguard  Prototype HPC Architectures  Astra – Petascale ARM platform  ATSE – Advanced Tri-lab Software Environment  R&D Opportunities  Conclusion
  • 4. Vanguard Program: Advanced Architecture Prototype Systems4 • Prove viability of advanced technologies for NNSA integrated codes, at scale • Expand the HPC-ecosystem by developing emerging unproven technologies • Is it viable for future ATS/CTS platforms? • Increase technology AND integrator choices • Buy down risk and increase technology and vendor choices for future platforms • Ability to accept higher risk allows for more/faster technology advancement • Lowers/eliminates mission risk and significantly reduces investment • Jointly address hardware and software challenges • First prototype platform targeting ARM
  • 5. Where Vanguard Fits5 V a n g u a r dT e s t B e d s A T S /C T S P la tfo r m s H ig h e r R is k , G r e a te r A r c h ite c tu r a l C h o ic e s G r e a te r S ta b ility , L a r g e r S c a le Test Beds • Small testbeds (~10-100 nodes) • Breadth of architectures • Brave users Vanguard • Larger-scale experimental systems • Focused eforts to mature new technologies • Broader user-base • Demonstrate viability for production use • NNSA Tri-lab resource ATS/CTS Platforms • Leadership-class systems (Petascale, Exascale, ...) • Advanced technologies, sometimes frst-of-kind • Broad user-base • Production use
  • 7. Hammer APM/HPE Xgene-1 Sullivan Cavium/Penguin ThunderX1 = 2018 = Retired = 2015 = 2017 TODAY Mayer Cavium HPE/Comanche ThunderX2 Future ASC Platforms Sept 2011 Astra Sandia’s NNSA/ASC ARM Platforms7 Petascale ARM Platform Delivery Aug/Sep 2018 HPE Apollo 70 Cavium ThunderX2 Mellanox ConnectX- 5 Switch-IB2 2592 nodes Cavium ThunderX 1 32 nodes Pre-GA Cavium ThunderX2 47 nodes Applied Micro X-Gene-1 47 nodes Vanguard
  • 8. 8 Astra “Per aspera ad astra” Demonstrate viability of ARM for U.S. DOE Supercomputnn 2.3 PFLOPs peak 885 TB/s memory bandwidth peak 332 TB memory 1.2 MW per aspera ad astra through difficulties to the stars
  • 9. Vanguard-Astra Compute Node Building Block9 Dual socket Cavium Thunder-X2  CN99xx  28 cores @ 2.0 GHz 8 DDR4 controllers per socket One 8 GB DDR4-2666 dual-rank DIMM per controller Mellanox EDR InfiniBand ConnectX-5 VPI OCP Tri-Lab Operating System Stack based on RedHat 7.5+ HPE Apollo 70 Cavium TX2 Node
  • 10. Vanguard-Astra Compute Node10 Cavium Thunder- X2 ARM v8.1 28 cores @ 2.0 GHz Cavium Thunder- X2 ARM v8.1 28 cores @ 2.0 GHz 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 GB DDR4-2666 DR 8 DDR4 channels/socket, 1 DIMM/channel Each socket has its own PCIe x8 link to NIC Mellanox ConnectX- 5 OCP Network Interface PCIe Gen3 PCIe Gen3 Management Ethernet 1 Gbps x8x8 1 EDR link, 100 Gbps1 Gbps
  • 11. Vanguard-Astra System Packaging11 HPE Apollo 70 Chassis: 4 nodes Astra 18 chassis/rack 72 nodes/rack 3 IB switches/rack (one 36-port switch per 6 chassis) 36 compute racks (9 scalable units, each 4 racks) 2592 compute nodes (5184 TX2 processors) 3 IB spine switches (each 540-port) HPE Apollo 70 Rack
  • 12. Vanguard-Astra Infrastructure12 Login & Service Nodes 4 login/compilation nodes 3 Lustre routers to connect to external Sandia flesystem(s) 2 general service nodes Interconnect EDR InfniBand in fat tree topology 2:1 oversubscribed for compute nodes 1:1 full bandwidth for in-platform Lustre storage System Management Dual HA management nodes running HPE Performance Software – Cluster Manager (HPCM) Ethernet management network, connects to all nodes One boot server per scalable unit (288 nodes) In-platform Storage All-fash Lustre storage system 403 TB usable capacity 244 GB/s throughput
  • 13. Network Topology13 540-Port Switch #2 540-Port Switch #3540-Port Switch #1 Switch 2.1 Switch 2.2 Switch 2.3 Switch 2.4 Switch 2.29 Switch 2.30 Switch 3.1 Switch 3.2 Switch 3.18... ... Switch 2.61 Switch 2.62 Switch 2.63 Switch 2.64 Switch 2.89 Switch 2.90 Switch 3.37 Switch 3.38 Switch 3.54... ... Switch 1.1 Switch 1.2 Switch 1.3 Switch 1.4 Switch 1.5 Switch 1.6 Switch 1.7 Switch 1.8 ... Switch 1.105 Switch 1.06 Switch 1.107 Switch 1.108 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes 24 nodes Switch 2.31 Switch 2.32 Switch 2.33 Switch 2.34 Switch 2.59 Switch 2.60 Switch 3.19 Switch 3.20 Switch 3.36... ... 108 L1 switches * 24 nodes/switch = 2592 compute nodes Mellanox Switch-IB2 EDR, Radix 36 switches, 3 level fat tree, 2:1 taper at L1 Each L1 switch has 4 links to each 540-port switch
  • 14. Vanguard-Astra Advanced Power & Cooling14 18.5C WB 20.0C 20.0C 1.5C approach wall peak nominal (linpack) idle racks wall peak nominal (linpack) idle Node racks 39888 35993 33805 6761 36 1436.0 1295.8 1217.0 243.4 MCS300 10500 7400 7400 170 12 126.0 88.8 88.8 2.0 Network 12624 10023 9021 9021 3 37.9 30.1 27.1 27.1 Storane 11520 10000 10000 1000 2 23.0 20.0 20.0 2.0 utlity 8640 5625 4500 450 1 8.6 5.6 4.5 0.5 1631.5 1440.3 1357.3 274.9 Projected power of the system by component perconsttuent rack type (W) total (kW) Extreme Efciency:  Total 1.2 MW in the 36 compute racks are cooled by only 12 fan coils  These coils are cooled without compressors year round. No evaporatve water at all almost 6000 hours a year  99% of the compute racks heat never leaves the cabinet, yet the system doesn’t require the internal plumbinn of liquid disconnects and cold plates runninn across all CPUs and DIMMs  Builds on work by NREL and Sandia: htps://www.nrel.nov/esif/partnerships-j c.html
  • 15. ATSE – Advanced Tri-lab Software Environment
  • 16. Advanced Tri-lab Software Environment Goals16 Build an open, modular, extensible, community-inspired, and vendor-adaptable ecosystem Prototype new technologies that may improve the DOE ASC computing environment (e.g., ML frameworks, containers, VMs, etc) Leverage existing efforts  Tri-lab OS (TOSS)  OpenHPC & other programming environments  Exascale Computing Project (ECP) software technologies Dec’17 ATSE DesignDoc Aug’17 Tri-labArmsoftware teamformed Jul’18 Initial ReleaseTarget Sep’18 FirstUseon Vanguard-Astra ATSE stack
  • 17. Tri-Lab Software Efort for ARM17 Accelerate ARM ecosystem for ASC computing  Prove viability for ASC integrated codes running at scale  Harden compilers, math libraries, tools, communication libraries  Heavily templated C++, Fortran 2003/2008, Gigabyte+ binaries, long compiles  Optimize performance, verify expected results Build integrated software stack  Programming environment (compilers, math libs, tools, MPI, OMP, SHMEM, I/O, ...)  Low-level OS (optimized Linux, network, filesystems, containers/VMs, ...)  Job scheduling and management (WLM, app launcher, user tools, ...)  System management (boot, system monitoring, image management, ...) Improve 0 to 60 time... ARM system arrival to useful work done env
  • 18. ARM Tri-lab Software Environment (ATSE) Vanguard Hardware Base OS Layer Closed Source Integrator ProvidedLimited Distribution ATSE Activity Vendor OS TOSS Open OS e.g. OpenSUSE Cluster Middleware e.g. Lustre, SLURM ATSE Programming Environment “Product” for Vanguard Platform-optimized builds, common-look-and-feel across platforms Virtual Machines ATSE Packaging User-facing Programming Env Native Installs Containers NNSA/ASC Application Portfolio Open Source
  • 19. Integrate Components from Many Sources TOSS RHEL EPEL Vendor Sofware ATSE Packaner OpenHPC Open Build Server ATSE Packanes Vendor Sofware Koji Build Server ATSEDiagram(fromSNLFeb12TOSSmeeting) VanguardHardware Base OS Layer Closed Source Intenrator ProvidedLimited Distribution ATSE Activity VendorOS TOSS OpenOS e.g.OpenSUSE Cluster Middleware e.n. Lustre, SLURM ATSEProgrammingEnvironment“Product”forVanguard Platform-optimized builds, common-look-and-feel across platforms Virtual MachinesATSE Packaninn User-facinn Pronramminn Env Native InstallsContainers NNSA/ASCApplicationPortfolio Open Source ATSE Activity Closed Source Integrator Provided Limited Distribution Open Source Key:
  • 20. Draft ATSE Timeline for 2018 March Continue software stack explorations and gap analysis on testbeds Setup OpenBuild server and replicate OHPC package builds for aarch64 April – May Develop ATSE Packager framework, ability to pull packages from TOSS, RHEL, OpenHPC OBS, vendor, and other sources Identify initial component list July Initial ATSE release 2018.0 on Mayer Lab-distribution version: TOSS BaseOS + (ATSE-GCC | ATSE-ARM | ATSE-*)  Open-distribution version: SUSE and/or CentOS BaseOS + ATSE-GCC Q3 2018  Linux kernel optimization and HPC patches  Basic VM & container support Q4 2018  ATSE 2018.1 release  Initial upstream to OpenHPC push
  • 21. Astra Status and Research
  • 22. Acceptance Plan – Maturing the Stack22 env
  • 23. Cavium Arm64 Providing Best-of-Class Memory Bandwidth 23 STREAM TRIAD TX2 DDR4-2400 SkyLake 8160 Trinity Haswell Trinity KNL DDR
  • 24. Network Bandwidth on ThunderX2 + Mellanox MLX5 EDR with Socket Direct 24 Node 1 MLX5 EDR MLX5_0 MLX5_3 Socket 1 Socket 2 Node 2 MLX5 EDR MLX5_3MLX5_0 Socket 2Socket 1 Pair 1 Pair 1 Pair 2 Pair 2 1 Network Link Socket Direct – Each socket has dedicated path to the NIC OSU MPI Multi- Network Bandwidth Arm64 + EDR providing > 12 GB/s between nodes > 75M messages/se c
  • 25. Mini-App Performance on Cavium ThunderX225 ThunderX2 providing high memory bandwidth 6 channels (Skylake) vs. 8 in ThunderX2 See this in MiniFE SpMV and STREAM Triad Slower compute reflects less optimization in software stack Examples – Non-SpMV kernels in MiniFE and LULESH GCC and ARM versus Intel compiler MiniFE Solve GF/s MiniFE SpMV GF/s STREAM Triad LULESH Solve FOM 0 0.5 1 1.5 2 2.5 Speedup over Haswell E5-2680v3 ThunderX2 Skylake 8160 Haswell E5-2680 SpeedupoverHaswell
  • 26. R&D Areas26 Leverage containers and virtual machines  Support for machine learning frameworks  ARMv8.1 includes new virtualization extensions, SR-IOV  Working with Singularity on full container solution Evaluating parallel filesystems + I/O systems @ scale  GlusterFS, Ceph, BeeGFS, Sandia Data Warehouse, … Resilience studies over Astra lifetime Improved MPI thread support, matching acceleration OS optimizations for HPC @ scale  Exploring HPC-tuned Linux kernels to non-Linux lightweight kernels and multi-kernels  Arm-specific optimizations
  • 27. Conclusion27 Vanguard allows the DOE to take necessary risks to ensure a healthy HPC ecosystem for future production mission platforms Increase technology choices Prove ability to run multi-physics production applications at scale Tri-lab software stack effort to mature ARM for ASC computing Harden compilers, math libs, and tools Optimize performance, verify expected results Increase modularity and openness of software stack Support traditional HPC and emerging AI + ML workloads Sandia now a member of Linaro HPC SIG!
  • 29. Virt - From x86 to ARM64 ▪ What opportunites and challennes do we face when movinn from an x86 world to an ARM world? ▪ Virtual Machines ▪ Near-natve HPC performance with VMs possible in x86 – Type1 & Type2 hypervisors – Hobbes / Palacios VM ▪ Avoid lenacy issues with x86? ▪ Anythinn new we can do with ARM? ▪ Containers ▪ How do I build containers on my x86 laptop that run on Astra? ▪ Focus on ABI compatbility ▪ Leverane industry/enterprise without losinn HPC focus per aspera ad astra through difficulties to the stars