SlideShare a Scribd company logo
1 of 28
Download to read offline
Evaluating HPX and Kokkos on RISC-V using an
Astrophysics Application Octo-Tiger
Patrick Diehl
Joint work with: Gregor Daiß, Steven R. Brandt, Alireza Kheirkhahan,
Hartmut Kaiser, Christopher Taylor, and John Leidel
Louisiana State University
patrickdiehl@lsu.edu
November 13, 2023
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 1 / 28
Motivation
What is RISC-V?
RISC-V was introduced in 2015 as an open standard instruction set
architecture (ISA); RISC-V is an iteration on established reduced
instruction set computer (RISC) principles
Why is it interesting for the HPC community?
The RISC-V ISA is completely open for use by anyone and is
royalty-free.
RISC-V is extensible; processor features can be added to provide
customized capabilities (ie: Cache management, SIMD, and Vector
Machine support are optional)
European Processor Initiative (EPI), which aims to develop a
vendor-independent European CPU for high-performance computing,
has identified RISC-V as a target for future investment.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 2 / 28
Overview
1 Astrophysical application
2 Software stack
Octo-Tiger
Kokkos
HPX
3 Porting the software stack to RISC-V
4 In-house RISC-V Test System
5 Performance measurements
Node level scaling
Distributed scaling
6 Energy consumption
7 Conclusion and Outlook
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 3 / 28
Astrophysical application
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 4 / 28
Example simulation
Astrophysical event: Merging of two stars – Flow on the surface which
corresponds in layman’s terms to the weather on the stars.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 5 / 28
Software stack
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 6 / 28
Octo-Tiger
Astrophysics open source program1 simulating the evolution of star
systems based on the fast multipole method on adaptive Octrees.
Modules
Hydro
Gravity
Radiation
Supports
Communication:
MPI/libfabric/LCI/GASNet +
OpenSHMEM
Backends: CUDA, HIP, SYCL
Reference
Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx
parallelization.” Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382.
1
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 7 / 28
Kokkos: C++ Performance Portability Programming
EcoSystem
Kokkos is a C++ library2 for writing performance portable applications
targeting all major HPC platforms
CPU
OpenMP
HPX
GPU
Native: CUDA & HIP
SYCL: CUDA & HIP
Reference
Trott, Christian R., et al. ”Kokkos 3: Programming model extensions for the exascale era.” IEEE Transactions on
Parallel and Distributed Systems 33.4 (2021): 805-817.
2
https://github.com/kokkos/kokkos
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 8 / 28
HPX
HPX is a open source C++ Standard Library for Concurrency and
Parallelism3.
Features
HPX exposes a uniform, standards-oriented API for ease of
programming parallel and distributed applications.
HPX provides unified syntax and semantics for local and remote
operations.
HPX exposes a uniform, flexible, and extendable performance counter
framework which can enable runtime adaptivity.
Reference
Kaiser, Hartmut, et al. ”HPX-the C++ standard library for parallelism and concurrency.” Journal of Open Source
Software 5.53 (2020): 2352.
3
https://github.com/STEllAR-GROUP/hpx
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 9 / 28
Porting the software stack to RISC-V
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 10 / 28
Porting HPX
Most parts of HPX are implemented ISO C++. However, small
portions of the runtime system are implemented using assembly.
The HPX context-switching software implementation can optionally
utilize Boost.Context support or a native independently provided
assembly implementation for a targeted ISA. Note HPX already relies
on Boost.
We had to do some single source code modification within the HPX
timer. The RISC-V HPX port implements timing using the RISC-V
RDTIME instruction. RDTIME is a pseudo-instruction that reads
bits from the time Control and Status Register (CSR).
Recall, that since we had an ISO C++ compiler and had Boost support, the
code changes were minimal.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 11 / 28
Porting Kokkos and Octo-Tiger
Kokkos
Building Kokkos required no changes to the code base and GCC
compiled the Kokkos without any issues.
However, Kokkos’s build system CMake files required some minor
changes. The RISC-V architecture was not detected, and incorrect
compiler flags were added for the architecture and vectorization.
Octo-Tiger
Octo-Tiger needed no porting after HPX and Kokkos were already
ported.
Due to the abstraction levels provided by HPX and Kokkos porting the
software stack was a walk in the park.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 12 / 28
In-house RISC-V Test System
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 13 / 28
In-house RISC-V test system
Image of the in-house cluster using
two VisionFive2 Open Source
RISC-V single board computers with
Quad-core StarFive JH7110 64-bit
CPU and 8GB LPDDR4 System
Memory.
Official image based on an older
Ubuntu version
Ubuntu Linux image based on
23.04 had the versions, we need
or the Slurm integration and
recent compilers.
The Ubuntu image does not
support USB and PCIe on the
VisionFive2.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 14 / 28
Performance measurements
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 15 / 28
Asynchronous programming
2 4 6 8 10
0
0.5
1
1.5
·1011
# cores
Flop/s
hpx::async and hpx::future
A64FX
Intel
AMD
RISC-V (U74-MC)
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 16 / 28
Parallel algorithms
2 4 6 8 10
0
0.5
1
·1011
# cores
Flop/s
hpx::for each - hpx::execution::par
A64FX
Intel
AMD
RISC-V (U74-MC)
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 17 / 28
Hardware information
Table: Clock speed in GHz, vector length, FPU units per core, fused
multiplication and addition (FMA) capability, and number of cores for various
CPUs obtained by the vendor specification or via cat /proc/cpuinfo — grep MHz.
Note the FMA is only available for the 32-bit ISA on RISC-V.
CPU [GHz] Vector length FPU⋆ FMA Cores [GFLOP/s]
ARM 1.8 8 2 ✓ 48 2764.8
AMD 2.8 4 2 ✓ 64 2867.2
Intel 2.3 8 2 ✓ 18 1324.8
RISC-V 1.2 NA 1 4 9.6
⋆ units per core
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 18 / 28
Normalized performance
After investigating the FLOP per second, we look into the ratio of floating
point operations per second normalized by the theoretical peak
performance of each CPU. The theoretical peak performance is given by
the clock speed times the vector length times the number of FPU units (#
FPU) and the number of cores (# cores)
PerfPeak(#cores) =
2 × clock speed × vector length × # FPU × # cores. (1)
Most CPUs support fused multiplication and addition (FMA), providing a
factor of two in Equation 1. The normalized performance is given by
PerfNorm(# cores) =
FLOPs(# cores)
PerfPeak(#cores)
. (2)
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 19 / 28
2 4 6 8 10
0.1
0.2
0.3
# cores
FLOPs(#
cores)
/
Perf
Peak
(#cores)
hpx::async and hpx::future
A64FX Intel
AMD RISC-V (U74-MC)
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 20 / 28
2 4 6 8 10
0
0.1
0.2
0.3
0.4
# cores
FLOPs(#
cores)
/
Perf
Peak
(#cores)
hpx::for each - hpx::execution::par
A64FX Intel
AMD RISC-V (U74-MC)
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 21 / 28
0 200 400 600 800 1,000
1-RISC
1-Fugaku
2-RISC-TCP
2-RISC-MPI
2-Fugaku-MPI
91
168
140
778
1,091
Cells processed per second
Figure: Distributed scaling for a single node and two nodes using TCP or MPI for
communication on RISC-V. Unfortunately, our in-house cluster only had two
nodes. For a comparison, runs on a single and two Supercomputer Fugaku nodes
are shown (each using only four cores out of the 48 available ones for a better
comparison).
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 22 / 28
Energy consumption
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 23 / 28
How to measure energy consumption?
We want to compare the RISC-V boards and Supercomputer Fugaku for
the astrophysics application.
On Supercomputer Fugaku the power consumption was measured
with the PowerAPI interface provided by Riken.
On the RISC-V boards, no hardware counters for power measurements
are present. Here, we attached a power meter to the USB power
source and measured the power consumption while running the Linux
command stress –cpu 4 and while running Octo-Tiger with four cores.
Would be nice to have hardware counters to get more sophisticated
measurements for RISC-V!
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 24 / 28
Energy consumption on RISC-V and A64FX CPUs
0 0.5 1 1.5 2
1-RISC
1-Fugaku
2-RISC-TCP
2-RISC-MPI
2-Fugaku-MPI
1.19
1.28
1.53
0.92
1.46
Watth
Figure: On Supercomputer Fugaku, the power consumption was measured using
PowerAPI. Due to missing hardware counters, the power consumption was
measured using a power meter on RISC-V.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 25 / 28
Conclusion and Outlook
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 26 / 28
Conclusion and Outlook
Conclusion
Porting the software stack was rather easy due to the advanced C++
compilers on RISC-V.
HPX and Octo-Tiger scaled from one up to four cores. However, more
RAM and more cores are needed for sophisticated benchmarking.
Outlook
Use HPC-grade hardware, e.g. Milk-V’s Pioneer with 64 Cores and up
to 128 GB RAM, and not development boards.
Add features that would benefit HPX and AMT, like one-cycle
context switches, extended atomics, hardware support for global
address space, and possibly hardware support for thread scheduling
(hardware queues); to the ongoing development of ISA extensions.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 27 / 28
Advertisement
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 28 / 28

More Related Content

Similar to Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger

OpenACC Monthly Highlights: July 2021
OpenACC Monthly Highlights: July  2021OpenACC Monthly Highlights: July  2021
OpenACC Monthly Highlights: July 2021OpenACC
 
01 From K to Fugaku
01 From K to Fugaku01 From K to Fugaku
01 From K to FugakuRCCSRENKEI
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
 
Deployment of an HPC Cloud based on Intel hardware
Deployment of an HPC Cloud based on Intel hardwareDeployment of an HPC Cloud based on Intel hardware
Deployment of an HPC Cloud based on Intel hardwareIntel IT Center
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC
 
OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC
 
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...inside-BigData.com
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersFacultad de Informática UCM
 
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC
 
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Hajime Tazaki
 
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...Toradex
 
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...inside-BigData.com
 
Deploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi ClustersDeploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi ClustersPatrick Diehl
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018NVIDIA
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC
 
Open cl programming using python syntax
Open cl programming using python syntaxOpen cl programming using python syntax
Open cl programming using python syntaxcsandit
 

Similar to Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger (20)

OpenACC Monthly Highlights: July 2021
OpenACC Monthly Highlights: July  2021OpenACC Monthly Highlights: July  2021
OpenACC Monthly Highlights: July 2021
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
01 From K to Fugaku
01 From K to Fugaku01 From K to Fugaku
01 From K to Fugaku
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
Deployment of an HPC Cloud based on Intel hardware
Deployment of an HPC Cloud based on Intel hardwareDeployment of an HPC Cloud based on Intel hardware
Deployment of an HPC Cloud based on Intel hardware
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020
 
LPC4300_two_cores
LPC4300_two_coresLPC4300_two_cores
LPC4300_two_cores
 
OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022
 
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
 
IJCRT2006062.pdf
IJCRT2006062.pdfIJCRT2006062.pdf
IJCRT2006062.pdf
 
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
 
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
 
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
 
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
 
Deploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi ClustersDeploying a Task-based Runtime System on Raspberry Pi Clusters
Deploying a Task-based Runtime System on Raspberry Pi Clusters
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020
 
Open cl programming using python syntax
Open cl programming using python syntaxOpen cl programming using python syntax
Open cl programming using python syntax
 

More from Patrick Diehl

D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and ToolsD-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and ToolsPatrick Diehl
 
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...Patrick Diehl
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondPatrick Diehl
 
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranFramework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranPatrick Diehl
 
JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...Patrick Diehl
 
A tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsA tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsPatrick Diehl
 
Recent developments in HPX and Octo-Tiger
Recent developments in HPX and Octo-TigerRecent developments in HPX and Octo-Tiger
Recent developments in HPX and Octo-TigerPatrick Diehl
 
Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Patrick Diehl
 
Quantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchQuantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchPatrick Diehl
 
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Patrick Diehl
 
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...Patrick Diehl
 
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...Patrick Diehl
 
Recent developments in HPX and Octo-Tiger
Recent developments in HPX and Octo-TigerRecent developments in HPX and Octo-Tiger
Recent developments in HPX and Octo-TigerPatrick Diehl
 
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Patrick Diehl
 
A review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics modelsA review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics modelsPatrick Diehl
 
On the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsOn the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsPatrick Diehl
 
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...Patrick Diehl
 
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Patrick Diehl
 

More from Patrick Diehl (18)

D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and ToolsD-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
 
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff Hammond
 
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in FortranFramework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
 
JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...JOSS and FLOSS for science: Examples for promoting open source software and s...
JOSS and FLOSS for science: Examples for promoting open source software and s...
 
A tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local modelsA tale of two approaches for coupling nonlocal and local models
A tale of two approaches for coupling nonlocal and local models
 
Recent developments in HPX and Octo-Tiger
Recent developments in HPX and Octo-TigerRecent developments in HPX and Octo-Tiger
Recent developments in HPX and Octo-Tiger
 
Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...Challenges for coupling approaches for classical linear elasticity and bond-b...
Challenges for coupling approaches for classical linear elasticity and bond-b...
 
Quantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchQuantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task Bench
 
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...Interactive C++ code development using C++Explorer and GitHub Classroom for e...
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
 
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
 
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
 
Recent developments in HPX and Octo-Tiger
Recent developments in HPX and Octo-TigerRecent developments in HPX and Octo-Tiger
Recent developments in HPX and Octo-Tiger
 
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
 
A review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics modelsA review of benchmark experiments for the validation of peridynamics models
A review of benchmark experiments for the validation of peridynamics models
 
On the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsOn the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic models
 
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...EMI 2021 - A comparative review of peridynamics and phase-field models for en...
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
 
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
 

Recently uploaded

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 

Recently uploaded (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 

Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger

  • 1. Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger Patrick Diehl Joint work with: Gregor Daiß, Steven R. Brandt, Alireza Kheirkhahan, Hartmut Kaiser, Christopher Taylor, and John Leidel Louisiana State University patrickdiehl@lsu.edu November 13, 2023 P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 1 / 28
  • 2. Motivation What is RISC-V? RISC-V was introduced in 2015 as an open standard instruction set architecture (ISA); RISC-V is an iteration on established reduced instruction set computer (RISC) principles Why is it interesting for the HPC community? The RISC-V ISA is completely open for use by anyone and is royalty-free. RISC-V is extensible; processor features can be added to provide customized capabilities (ie: Cache management, SIMD, and Vector Machine support are optional) European Processor Initiative (EPI), which aims to develop a vendor-independent European CPU for high-performance computing, has identified RISC-V as a target for future investment. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 2 / 28
  • 3. Overview 1 Astrophysical application 2 Software stack Octo-Tiger Kokkos HPX 3 Porting the software stack to RISC-V 4 In-house RISC-V Test System 5 Performance measurements Node level scaling Distributed scaling 6 Energy consumption 7 Conclusion and Outlook P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 3 / 28
  • 4. Astrophysical application P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 4 / 28
  • 5. Example simulation Astrophysical event: Merging of two stars – Flow on the surface which corresponds in layman’s terms to the weather on the stars. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 5 / 28
  • 6. Software stack P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 6 / 28
  • 7. Octo-Tiger Astrophysics open source program1 simulating the evolution of star systems based on the fast multipole method on adaptive Octrees. Modules Hydro Gravity Radiation Supports Communication: MPI/libfabric/LCI/GASNet + OpenSHMEM Backends: CUDA, HIP, SYCL Reference Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.” Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382. 1 P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 7 / 28
  • 8. Kokkos: C++ Performance Portability Programming EcoSystem Kokkos is a C++ library2 for writing performance portable applications targeting all major HPC platforms CPU OpenMP HPX GPU Native: CUDA & HIP SYCL: CUDA & HIP Reference Trott, Christian R., et al. ”Kokkos 3: Programming model extensions for the exascale era.” IEEE Transactions on Parallel and Distributed Systems 33.4 (2021): 805-817. 2 https://github.com/kokkos/kokkos P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 8 / 28
  • 9. HPX HPX is a open source C++ Standard Library for Concurrency and Parallelism3. Features HPX exposes a uniform, standards-oriented API for ease of programming parallel and distributed applications. HPX provides unified syntax and semantics for local and remote operations. HPX exposes a uniform, flexible, and extendable performance counter framework which can enable runtime adaptivity. Reference Kaiser, Hartmut, et al. ”HPX-the C++ standard library for parallelism and concurrency.” Journal of Open Source Software 5.53 (2020): 2352. 3 https://github.com/STEllAR-GROUP/hpx P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 9 / 28
  • 10. Porting the software stack to RISC-V P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 10 / 28
  • 11. Porting HPX Most parts of HPX are implemented ISO C++. However, small portions of the runtime system are implemented using assembly. The HPX context-switching software implementation can optionally utilize Boost.Context support or a native independently provided assembly implementation for a targeted ISA. Note HPX already relies on Boost. We had to do some single source code modification within the HPX timer. The RISC-V HPX port implements timing using the RISC-V RDTIME instruction. RDTIME is a pseudo-instruction that reads bits from the time Control and Status Register (CSR). Recall, that since we had an ISO C++ compiler and had Boost support, the code changes were minimal. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 11 / 28
  • 12. Porting Kokkos and Octo-Tiger Kokkos Building Kokkos required no changes to the code base and GCC compiled the Kokkos without any issues. However, Kokkos’s build system CMake files required some minor changes. The RISC-V architecture was not detected, and incorrect compiler flags were added for the architecture and vectorization. Octo-Tiger Octo-Tiger needed no porting after HPX and Kokkos were already ported. Due to the abstraction levels provided by HPX and Kokkos porting the software stack was a walk in the park. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 12 / 28
  • 13. In-house RISC-V Test System P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 13 / 28
  • 14. In-house RISC-V test system Image of the in-house cluster using two VisionFive2 Open Source RISC-V single board computers with Quad-core StarFive JH7110 64-bit CPU and 8GB LPDDR4 System Memory. Official image based on an older Ubuntu version Ubuntu Linux image based on 23.04 had the versions, we need or the Slurm integration and recent compilers. The Ubuntu image does not support USB and PCIe on the VisionFive2. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 14 / 28
  • 15. Performance measurements P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 15 / 28
  • 16. Asynchronous programming 2 4 6 8 10 0 0.5 1 1.5 ·1011 # cores Flop/s hpx::async and hpx::future A64FX Intel AMD RISC-V (U74-MC) P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 16 / 28
  • 17. Parallel algorithms 2 4 6 8 10 0 0.5 1 ·1011 # cores Flop/s hpx::for each - hpx::execution::par A64FX Intel AMD RISC-V (U74-MC) P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 17 / 28
  • 18. Hardware information Table: Clock speed in GHz, vector length, FPU units per core, fused multiplication and addition (FMA) capability, and number of cores for various CPUs obtained by the vendor specification or via cat /proc/cpuinfo — grep MHz. Note the FMA is only available for the 32-bit ISA on RISC-V. CPU [GHz] Vector length FPU⋆ FMA Cores [GFLOP/s] ARM 1.8 8 2 ✓ 48 2764.8 AMD 2.8 4 2 ✓ 64 2867.2 Intel 2.3 8 2 ✓ 18 1324.8 RISC-V 1.2 NA 1 4 9.6 ⋆ units per core P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 18 / 28
  • 19. Normalized performance After investigating the FLOP per second, we look into the ratio of floating point operations per second normalized by the theoretical peak performance of each CPU. The theoretical peak performance is given by the clock speed times the vector length times the number of FPU units (# FPU) and the number of cores (# cores) PerfPeak(#cores) = 2 × clock speed × vector length × # FPU × # cores. (1) Most CPUs support fused multiplication and addition (FMA), providing a factor of two in Equation 1. The normalized performance is given by PerfNorm(# cores) = FLOPs(# cores) PerfPeak(#cores) . (2) P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 19 / 28
  • 20. 2 4 6 8 10 0.1 0.2 0.3 # cores FLOPs(# cores) / Perf Peak (#cores) hpx::async and hpx::future A64FX Intel AMD RISC-V (U74-MC) P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 20 / 28
  • 21. 2 4 6 8 10 0 0.1 0.2 0.3 0.4 # cores FLOPs(# cores) / Perf Peak (#cores) hpx::for each - hpx::execution::par A64FX Intel AMD RISC-V (U74-MC) P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 21 / 28
  • 22. 0 200 400 600 800 1,000 1-RISC 1-Fugaku 2-RISC-TCP 2-RISC-MPI 2-Fugaku-MPI 91 168 140 778 1,091 Cells processed per second Figure: Distributed scaling for a single node and two nodes using TCP or MPI for communication on RISC-V. Unfortunately, our in-house cluster only had two nodes. For a comparison, runs on a single and two Supercomputer Fugaku nodes are shown (each using only four cores out of the 48 available ones for a better comparison). P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 22 / 28
  • 23. Energy consumption P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 23 / 28
  • 24. How to measure energy consumption? We want to compare the RISC-V boards and Supercomputer Fugaku for the astrophysics application. On Supercomputer Fugaku the power consumption was measured with the PowerAPI interface provided by Riken. On the RISC-V boards, no hardware counters for power measurements are present. Here, we attached a power meter to the USB power source and measured the power consumption while running the Linux command stress –cpu 4 and while running Octo-Tiger with four cores. Would be nice to have hardware counters to get more sophisticated measurements for RISC-V! P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 24 / 28
  • 25. Energy consumption on RISC-V and A64FX CPUs 0 0.5 1 1.5 2 1-RISC 1-Fugaku 2-RISC-TCP 2-RISC-MPI 2-Fugaku-MPI 1.19 1.28 1.53 0.92 1.46 Watth Figure: On Supercomputer Fugaku, the power consumption was measured using PowerAPI. Due to missing hardware counters, the power consumption was measured using a power meter on RISC-V. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 25 / 28
  • 26. Conclusion and Outlook P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 26 / 28
  • 27. Conclusion and Outlook Conclusion Porting the software stack was rather easy due to the advanced C++ compilers on RISC-V. HPX and Octo-Tiger scaled from one up to four cores. However, more RAM and more cores are needed for sophisticated benchmarking. Outlook Use HPC-grade hardware, e.g. Milk-V’s Pioneer with 64 Cores and up to 128 GB RAM, and not development boards. Add features that would benefit HPX and AMT, like one-cycle context switches, extended atomics, hardware support for global address space, and possibly hardware support for thread scheduling (hardware queues); to the ongoing development of ISA extensions. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 27 / 28
  • 28. Advertisement P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V November 13, 2023 28 / 28