SlideShare a Scribd company logo
1 of 64
Download to read offline
e
e
e
©ARM 2017
Scientific Computing on ARM
Part 1
Chris Adeniyi-Jones
Linaro Connect - Budapest
Principal Engineer, Software and Large-Scale Systems
ARM Research
Thursday 9th March 2017
©ARM 20172
e
e
e
Agenda
§ How this got started …
§ Ecosystem for HPC
§ Compilers
§ Libraries
§ Debuggers
§ Profilers
§ ScalableVector Extension
§ FinalThoughts
§ Hands-on Sessions
©ARM 20173
e
e
e
Develop an energy-efficient HPC prototype
using low-power commodity embedded
technology.
Small beginnings: Mont-Blanc Project - 2011
Compute
Acceleration
Prototype
evaluation
Performance
Prediction MemoryInterconnect
Design
Study
Port and optimize HPC
applications on the
prototype
Research into technologies
required for the next-generation
HPC system
©ARM 20174
e
e
e
Mont-Blanc HPC Software Ecosystem
©ARM 20175
e
e
e
ARM RESEARCH
Why “Commodity”?
Supercomputing with Commodity CPUs:Are Mobile SoCs Ready for HPC?
Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic,Alex Ramirez, MateoValero
Cost
Availability
Performance
Energy efficiency?
©ARM 20176
e
e
e
Serious ARM HPC deployments starting in 2017
§ Two big announcements in 2017 about ARM in HPC in Europe
©ARM 20177
e
e
e
Fujitsu HPC CPU
slides from Fujitsu at ISC’16
©ARM 20178
e
Ecosystem for HPC
©ARM 20179
e
e
e
Ecosystem for HPC
List of software components needed:
§ Linux OS availability
§ Compilers
§ Libraries
§ Debuggers
§ Profilers
§ Job schedulers
Mix of open source and commercial products and applications…
©ARM 201710
e
e
e
Released
Planned
Concept
2016
Hardware
Open-Source
software
ARM HPC
tools
ARM HPC ecosystem roadmap
2017 Future
Cavium ThunderX
Allinea DDT and MAP
Rogue Wave TotalView
AppliedMicro X-Gene 1 & 2
Fujitsu – Post K (SVE)
ISV software
AMD Seattle
Altair PBS Pro
Cavium ThunderX2
Qualcomm Centriq
NAG Library & Compiler
ARM Optimized Routines
GCC (gcc/g++/gfortran)
LLVM - clang LLVM – Flang
ISV software
ARM Code Advisor (Beta) ARM Code Advisor (Full release)
ARM Optimized Routines – vector versions
Phytium Mars
OpenHPC 1.2
ARM Performance Libraries
PathScale ENZO
ARM Fortran CompilerARM C/C++ Compiler – ahead of LLVM trunk
ARM Instruction Emulator
AppliedMicro X-Gene 3
©ARM 201711
e
e
e
Open source in the ARM HPC ecosystem
§ Over the past 12 months many more packages and applications have been
successfully ported to ARM HPC
©ARM 201712
e
e
e
Linux / FreeBSD w/ AARCH64 support
ß Engaged with FreeBSD foundation / Semi-half & Cavium to get FreeBSD on ARMv8
FreeBSD Beta version demo’d by Semihalf – Nov. 2015
12.04LTS & 14.04LTS
released
ß Also 14.10 & 15.04 releases
Red Hat Enterprise Linux Server for ARM
7.2 BETA – Sept, 2015
Fedora 22 released – May 2015
Fedora 23 released – Nov 2015 CentOS Linux 7 for AArch64
GA – August 2015
OpenSUSE 13.2 – Nov 2014 SUSE Launches Partner Program to Bring
SUSE Linux Enterprise 12 to 64-bit ARM
July 2015 @ ISC
Debian 8 adds AARCH64 – April 2015
©ARM 201713
e
e
e
HPC filesystems
Software Status
LUSTRE Ported
HDFS Ported
CEPH Ported
BeeGFS Ported
©ARM 201714
e
e
e
Workload and cluster managers
Software Status
IBM LSF Ported
HP CMU Ported
SLURM Ported
Adaptive Computing (Moab) Future
Altair PBSWorks Ported
OpenLava (LSF port) Ported
©ARM 201715
e
Compilers
©ARM 201716
e
e
e
Open source and commercial compilers
§ LLVM
§ C, C++
§ OpenMP 3.1, (4.0 coming soon)
§ Fortran coming Q1 2017
§ PathScale
§ C, C++ Fortran
§ OpenACC
§ OpenMP 4.0
§ NAG
§ Fortran
§ OpenMP 3.1
§ GCC
§ C, C++, Fortran
§ OpenMP 4.0
§ ARM C/C++ Compiler
§ LLVM based
§ Includes SVE
©ARM 201717
e
e
e
e
ARM C/C++ Compiler
Commercially supported C/C++ compiler for Linux user-space HPC applications
LLVM-based
§ ARM-on-ARM compiler
§ For application development (not bare-metal/embedded)
Regularly pulls from upstream LLVM, adding:
§ SVE support in the assembler, disassembler, intrinsics and autovectorizer
§ Compiler Insights to support ARM Code Advisor
OpenMP
§ Uses latest open source (now ARM-optimized) LLVM OpenMP runtime
§ Changes pushed back to the community
©ARM 201718
e
e
e
ARM C/C++ Compiler – usage
§ To compile C code:
§ To compile C++ code:
% armclang -O3 file.c –o file
% armclang++ -O3 file.cpp –o file
©ARM 201719
e
e
e
Common armclang options
Flag Description
--help Describes the most common options supported by
ARM C/C++ Compiler
--vsn
--version
Displays version information and license details
-O<level> Specifies the level of optimization to use when compiling source files.
The default is -O0
-c Performs the compilation step, but does not perform the link step.
Produces an ELF object .o file. Run armclang again, passing in the
object files to link
-o <file> Specifies the name of the output file
-fopenmp Use OpenMP
-S Outputs assembly code, rather than object code. Produces a text .s file
containing annotated assembly code
©ARM 201720
e
e
e
LLVM OpenMP development
§ We have been contributing
ARM-related upstream patches
to LLVM’s ‘libomp’
§ Example shown here is the Lulesh
benchmark at size=80 running on
1-96 cores
§ GCC has slightly better serial
performance
§ libomp demonstrates superior
scaling
§ Even when used with GCC!
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 16 32 48 64 80 96
GCC + GOMP GCC + OMP LLVM + OMP
Number of threads
Zonespersecond
©ARM 201721
e
e
e
Parallelism to enable optimal HPC performance
§ OpenMP
§ We are adding enhancements to the LLVM OpenMP implementation to get better
AArch64 performance
§ ARM is active member of the OpenMP Standards Committee
§ OpenACC
§ PathScale and PGI are strong supporters of OpenACC
§ supported for ARM within ENZO
§ Auto-vectorization
§ ARM actively works on vectorization in GCC and LLVM, and encourages work with
vectorization support in the compiler community.
§ PathScale’s compiler has vectorization support built in
§ MPI parallelism “just works”
§ Better Infiniband driver support is coming from Mellanox
©ARM 201722
e
Libraries
©ARM 201723
e
e
e
OpenHPC is a community effort to provide a
common, verified set of open source packages for
HPC deployments
ARM’s participation:
§ Silver member of OpenHPC
§ ARM is on the OpenHPCTechnical Steering
Committee in order to drive ARM build support
Status: 1.2.0 release out now
§ All packages built on ARMv8 for CentOS and SUSE
§ ARM-based machines are being used for building
and also in the OpenHPC build infrastructure
Functional
Areas
Components include
Base OS RHEL/CentOS 7.1, SLES 12
Administrative
Tools
Conman, Ganglia, Lmod, LosF, ORCM, Nagios, pdsh,
prun
Provisioning Warewulf
Resource Mgmt. SLURM, Munge. Altair PBS Pro*
I/O Services Lustre client (community version)
Numerical/Scientifi
c Libraries
Boost, GSL, FFTW, Metis, PETSc, Trilinos, Hypre,
SuperLU, Mumps
I/O Libraries HDF5 (pHDF5), NetCDF (including C++ and Fortran
interfaces), Adios
Compiler Families GNU (gcc, g++, gfortran)
MPI Families OpenMPI, MVAPICH2
Development
Tools
Autotools (autoconf, automake, libtool), Valgrind,R,
SciPy/NumPy
Performance
Tools
PAPI, Intel IMB, mpiP, pdtoolkit TAU
– now on ARM
©ARM 201724
e
e
e
Open source library AArch64 inbuilt tuning work
ARM actively working with the community for increased support and performance
§ OpenBLAS
§ ARMv8 kernels included
§ BLIS
§ BLIS developers have close relationship with ARM
§ BLIS supports various ARM processors by default (e.g.ARM Cortex-A53, Cortex-A57 CPUs)
§ Also currently conducting ARM big.LITTLE development
§ ATLAS
§ Work ongoing with ARM Research team
§ Cortex-A57/A53 patches went into ATLAS
§ FFTW
§ Just works. NEON options built into v3.3.5
©ARM 201725
e
e
e
Commercial library support
Product availability for 64-bit ARMv8-A
§ ARM Performance Libraries
§ See following slides
§ NAG Library
§ Largest commercially available collection of numerical and statistical algorithms
§ > 1800 functions
§ Scales well, takes advantage of ARM Performance Libraries
§ Tested on Juno (ARM) and ThunderX (Cavium)
§ PathScale BLAS implementation
§ Comes with their ENZO compiler
©ARM 201726
e
e
e
ARM Performance Libraries
Commercial 64-bit ARMv8 math libraries
§ Commonly used low-level math routines - BLAS, LAPACK and FFT
§ Validated with NAG’s test suite, a de-facto standard
Best-in-class performance with commercial support
§ Tuned by ARM for Cortex-A72, Cortex-A57 and Cortex-A53
§ Maintained and Supported by ARM for a wide range of ARM-based SoCs
§ Regular benchmarking against open source alternatives
Silicon partners can provide tuned micro-kernels for their SoCs
§ Partners can collaborate directly working with our source-code and test
suite
§ Alternatively they can contribute through open source route
Commercially Supported
by ARM
Validated with
NAG test suite
Performance on par
with best-in-class math libraries
©ARM 201727
e
e
e
e
ARM Performance Libraries
Version 2.0: Improving performance and interoperability
BLAS
§ Enhanced parallelism for BLAS level 1
§ Hand-tuned kernels for BLAS level 3
LAPACK (3.6.1)
§ Added PLASMA-style Directed Acyclic Graphs for better
parallelism
FFTW Interface
§ Support for FFTW-compatible basic and advanced DFT interfaces
ScalableVector Extensions
§ Libraries built with SVE-capable compilers
§ Hand-written DGEMM and SGEMM kernels
©ARM 201728
e
e
e
ARM Performance Libraries – micro-architecture
§ ARM cores have a variety of designs,
created by both ARM and our partners
§ ARM Performance Libraries are creating
tailored versions of routines to target
these different micro-architectures
§ It is important to ensure that the correct
version is installed on your system
§ For example consider the different
performance in DGEMM running the
Cortex-A53 and Cortex-A57 kernels on
the right and wrong cores…
©ARM 201729
e
e
e
ARM Performance Libraries – linking
§ In order to compile applications using BLAS, LAPACK and FFT routines from
the ARM Performance libraries, four options are provided:
§ Serial and OpenMP builds
§ 32-bit and 64-bit integers
§ These translate into four binaries in /opt/arm/armpl-*/lib/
§ libarmpl.a
§ libarmpl_mp.a
§ Shared libraries (libarmpl*.so) are also provided
§ Compile and link using, for example
armclang -O3 file.c –fopenmp –c file.o –I${ARMPL_DIR}/include
armclang -O3 file.o –fopenmp –o file –L${ARMPL_DIR}/lib –larmpl_mp
§ libarmpl_int64.a
§ libarmpl_int64_mp.a
e
e
e
©ARM 2017
Scientific Computing on ARM
Part 2
Chris Adeniyi-Jones
Linaro Connect - Budapest
Principal Engineer, Software and Large-Scale Systems
ARM Research
Thursday 9th March 2017
©ARM 201731
e
Debuggers
©ARM 201732
e
e
e
Open source debugging tools
§ All the usual open source tools you would use for debugging work as expected
on ARMv8
§ printf()
§ gdb
§ valgrind (and its associated tools)
§ It is worth ensuring that the versions on your system are recent
§ Check Linaro webpages for up to date source
©ARM 201733
e
e
e
§ Common HPC debug environment
§ Active development for 30+ years
§ Thread specific breakpoints
§ Control individual thread execution
§ View thread specific stack and data
§ View complex data types easily
§ Track memory leaks in running applications
§ Supports C/C++ on Linux
§ Allowing the business to have
§ Predictable development schedules
§ Less time spent debugging
§ ARM support
§ Beta just concluded
§ Early access available November 2016 – version 2016.07
§ Full release planned for Feb 2017 – version 2017.1
TotalView for HPC
©ARM 201734
e
e
e
Allinea DDT
Parallel stack
view
Automated data
comparison:
sparklines
Parallel array
searching
Step, play, and
breakpoints
Offline
debugging
§ Who had a rogue behaviour ?
§ Merges stacks from processes and threads
§ Where did it happen?
§ Allinea DDT leaps to source automatically
§ How did it happen?
§ Detailed error message given to the user
§ Some faults evident instantly from source
§ Why did it happen?
§ Unique “Smart Highlighting”
§ Sparklines comparing data across processes
The debugger for C, C++ and Fortran threaded and parallel code
©ARM 201735
e
e
e
ARM debugging gotcha
§ Weakly ordered memory access means that changes to memory can be applied
in any order as long as single-core execution sees the data needed for program
correctness
§ Most parallel HPC codes we encountered used GCC’s libgomp (-fopenmp)
§ These behaved correctly on AArch64 using GCC 5.2
§ Some HPC codes we have come across have their own bespoke parallelization
§ Usually based directly on top of pthreads
§ Written to have more control over the threads of execution and how they synchronize
§ Problems are almost always down to a lock-free thread interaction implementation
Thread 0
A=1;
set=2;
Thread 1
if (set==2) C = A;
Weakly ordered memory model
©ARM 201736
e
Profilers
©ARM 201737
e
e
e
Performance hints
§ Use latest compiler version, not OS default
§ Use OMP_PROC_BIND=TRUE
§ Make sure you use OMP_PLACES if doing more than one run
©ARM 201738
e
e
e
§ Challenges
§ One (non-ARM) customer came to Allinea with help getting their code working better
§ Huge speed up on CONVERGE: from 2h down to 4 sec
§ Now possible to run jobs efficiently on hundreds of cores
§ Now possible to scale up from 2 million to 20 million nodes
Full case study:
http://www.allinea.com/news/201509/convergent-science-
ignites-combustion-simulation-performance-allinea-forge
Profiling – always important!
©ARM 201739
e
e
e
Profiling/debugging tools
Open source tools ported to ARM HPC
and “just work”
Tested applications include:
• PAPI and Perf counters
• Score-P, Cube, Scalasca
• MPE and Jumpshot
• TAU
©ARM 201740
e
e
e
e
ARM Code Advisor (Beta)
Combines static and dynamic information to produce actionable insights
Performance Advice
§ Compiler vectorization hints
§ Compilation flags advice
§ Fortran subarray warnings
§ OpenMP instrumentation
Insights from compilation and runtime
§ Compiler Insights are embedded into the application
binary by the ARM Compilers
§ OMPT interface used to instrument OpenMP runtime
Extensible Architecture
§ Users can write plugins to add their own analysis information
§ Data accessible via web-browser, command-line, and REST API to support new user interfaces
©ARM 201741
e
e
e
e
ARM Code Advisor (Beta)
Typical workflow
Source
Code Compile
Compiled
Binary
+Insight
Runtime
Profile
Profile
Analyze Web
View
HTTP
©ARM 201742
e
e
e
ARM Code Advisor – usage
§ Compile application with Insight functionality:
§ Run code:
§ Analyze collected data:
§ View analysis:
% armclang++ -O3 –insight file.cpp –o file
% armcadvisor collect ./example
Starting collection of program [./example] to
profile temp in /home/user1/armcadvisor-profiles
% armcadvisor analyze example
Generating analysis file, armcadvisor.advice
% armcadvisor web –-network --everyone –p 2010
Open your browser to one of:
http://server.arm.com:8080
http://127.0.0.1:8080
©ARM 201743
e
e
e
ARM Code Advisor – video
©ARM 201744
e
e
e
ARM Allinea MAP
Low overhead measurement
• Accurate, non-intrusive application
performance profiling
• Seamless – no recompilation or relinking
required
Easy to use
• Source code viewer pinpoints
bottleneck locations
• Zoom in to explore iterations,
functions and loops
Deep
• Measures CPU, communication, I/O and
memory to identify problem causes
• Identifies vectorization and cache
performance
©ARM 201745
e
e
e
caption
Energy efficiency with ARM’s Allinea tools
MACHINE
ALLINEA
ENERGY
EFFICIENT
APPLICATIONS
©ARM 201746
e
e
e
Allinea Forge: what’s new in 7.0 – examples
Lustre metrics in MAP
Custom metrics in MAP
PAPI metrics in PR
©ARM 201747
e
e
e
Quantify gains immediately
CPU RUN
GPU RUN
©ARM 201748
e
ARMv8-A
ScalableVector Extension
©ARM 201749
e
e
e
Introducing the ScalableVector Extension (SVE)
A vector extension to the ARMv8-A architecture; its major new features:
§ Gather-load and scatter-store
§ Per-lane predication
§ Predicate-driven loop control and management
§ Vector partitioning and SW managed speculation
§ Extended integer and floating-point horizontal reductions
SVE is not an extension of Advanced SIMD
§ A separate architectural extension with a new set of A64 instruction encodings
§ Focus is HPC scientific workloads, not media/image processing
©ARM 201750
e
e
e
What’s the vector length?
§ There is no preferred vector length
§ Vector Length (VL) is hardware choice,
from 128 to 2048 bits, in increments of 128
§ Does not need to be a power-of-2
§ Vector Length Agnostic programming adjusts
dynamically to the availableVL
§ No need to recompile, or to rewrite hand-
coded SVE assembler or C intrinsics
§ Has extensive implications for loop
optimizations
VL
128b
0 64
256b
0 128
384b
0 128 256
128
256
384
512b
0 128 256 384 512
©ARM 201751
e
e
e
SVE Compiler status
§ ARM C/C++ Compiler has SVE capabilities built in
§ We have a public snapshot of our LLVM changes
§ https://github.com/ARM-Software/LLVM-SVE
§ These are being incrementally upstreamed into the main LLVM codebase
§ We have started upstreaming our GCC changes
§ Changes to GNU binutils are already upstream
§ For more details of SVE usage see our blogs and whitepapers
§ https://www.community.arm.com/processors/b/blog/posts/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture
§ https://cms.developer.arm.com/-/media/developer/developers/hpc/white-papers/a-sneak-peek-into-sve-and-vla-programming.pdf
©ARM 201752
e
e
e
Compiling with SVE
§ To compile C code:
§ To compile C++ code:
% armclang -O3 -march=armv8-a+sve file.c –o file
% armclang++ -O3 -march=armv8-a+sve file.cpp –o file
©ARM 201753
e
e
e
e
ARM Instruction Emulator
Run SVE binaries at near native speed on existing ARMv8-A hardware
Trap-and-emulate of illegal userspace
instructions
§ Natively supported instructions run at full speed
§ Unsupported instructions are faithfully emulated in
software
Full integration with ARM Code Advisor
§ Plugin allows ARM Instruction Emulator to provide
hotspot information and other metrics
§ Command-line integration allows ARM Code Advisor
workflows to seamlessly integrate with ARM
Instruction Emulator
ARMV8-A
Linux
Converts unsupported
SVE instructions to
native ARMv8-A
instructions
ARMv8 Binary ARMv8 SVE Binary
ARM Instruction Emulator
©ARM 201754
e
e
e
Running with ARM Instruction Emulator (1)
§ To run SVE compiled code:
§ To list valid vector lengths:
% armie --vector-length 256 ./example
% armie --list-vectors
128 256 384 512 640 768 896 1024 1152 1280 1408
1536 1664 1792 1920 2048
©ARM 201755
e
e
e
ARM Code Advisor with SVE
§ Compile code with Insight as before
§ Run the executable using both ARM Code Advisor and ARM
Instruction Emulator
§ Analysis and visualization stages as before
% armcadvisor collect --armie --vector-length 256 
“./lulesh2.0 -s 15”
% armclang -O3 -march=armv8-a+sve -insight file.c –o file
©ARM 201756
e
Final thoughts
©ARM 201757
e
e
e
ARM HPC in 2017
§ The software ecosystem has matured significantly in the past two years
§ Commercial compilers, libraries, debuggers and profilers are all now available to
complement open source projects
§ Users will notice very few hurdles to migrating codes as programming
environments are very familiar
§ ARM HPC hardware continues to appear from many partners with small proof
of concept systems turning into bigger systems
§ SVE machines will only enhance performance further
©ARM 201758
e
e
e
Developer website : arm.com/hpc
ARM launched an HPC-specific microsite –
home to our HPC ecosystem offering:
• Technical reference material
• How-to guides
• Latest news and updates from partners
• Links to downloads of HPC libraries
• Third-party software recommendations
• Web forum for community discussion and
help
Participate and help drive the community
©ARM 201759
e
e
e
e
ARM HPC Products from arm.com/hpc
Supported, Integrated and Performant, available now
ARM Compiler for HPC ARM SVE Compiler for HPC ARM Code Advisor Beta
§ ARM C/C++ Compiler
§ ARM Performance Libraries
§ GCC 6.0
§ ARM C/C++ SVE Compiler
§ ARM SVE Performance Libraries
§ GCC 6.0
§ ARM Instruction Emulator
§ GCC for SVE with LLVM OpenMP
§ ARM Code Advisor
§ ARM C/C++ Compiler
§ gfortran with LLVM OpenMP
for ARM Code Advisor
©ARM 201760
e
e
e
Summary of today
§ Overview of the state of the ARM HPC Ecosystem
§ Hands-on experience running codes on a selection of ARM-based hardware
§ Introduced to commercial HPC tools developed by ARM and ecosystem partners
§ Visit arm.com/hpc for more information and tool access
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited
(or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be
trademarks of their respective owners.
Copyright © 2017 ARM Limited
©ARM 2017
http://www.arm.com/hpc
Contact: hpc@arm.com
©ARM 201762
e
Practical
©ARM 201763
e
e
e
Session 1
Using Linaro Developer Cloud instances
Username Password Work-Node
ssh user00@64.28.99.111 # Login to the login server
ssh node-0 # then login to a worker node
Files describing the hands-on are in your home directory:
session_1.txt session_1.pdf
Run this on your local machine to copy the session info file:
scp user00@64.28.99.111:/home/user00/session_1.pdf session_1.pdf
©ARM 201764
e
e
e
Using Linaro Developer Cloud instances
Example Username Password Work-Node
ssh user00@64.28.99.111 # Login to the login server
ssh node-0 # then login to a worker node
Files describing the hands-on are in you home directory:
session_2.txt session_2.pdf
Run this on your local machine to copy the session info file:
scp user00@64.28.99.111:/home/user00/session_2.pdf session_2.pdf
Session 2

More Related Content

More from Linaro

It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramLinaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNLinaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...Linaro
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionLinaro
 

More from Linaro (20)

It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

BUD17-403: Tutorial Scientific Computing on ARM-based Platforms

  • 1. e e e ©ARM 2017 Scientific Computing on ARM Part 1 Chris Adeniyi-Jones Linaro Connect - Budapest Principal Engineer, Software and Large-Scale Systems ARM Research Thursday 9th March 2017
  • 2. ©ARM 20172 e e e Agenda § How this got started … § Ecosystem for HPC § Compilers § Libraries § Debuggers § Profilers § ScalableVector Extension § FinalThoughts § Hands-on Sessions
  • 3. ©ARM 20173 e e e Develop an energy-efficient HPC prototype using low-power commodity embedded technology. Small beginnings: Mont-Blanc Project - 2011 Compute Acceleration Prototype evaluation Performance Prediction MemoryInterconnect Design Study Port and optimize HPC applications on the prototype Research into technologies required for the next-generation HPC system
  • 4. ©ARM 20174 e e e Mont-Blanc HPC Software Ecosystem
  • 5. ©ARM 20175 e e e ARM RESEARCH Why “Commodity”? Supercomputing with Commodity CPUs:Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic,Alex Ramirez, MateoValero Cost Availability Performance Energy efficiency?
  • 6. ©ARM 20176 e e e Serious ARM HPC deployments starting in 2017 § Two big announcements in 2017 about ARM in HPC in Europe
  • 7. ©ARM 20177 e e e Fujitsu HPC CPU slides from Fujitsu at ISC’16
  • 9. ©ARM 20179 e e e Ecosystem for HPC List of software components needed: § Linux OS availability § Compilers § Libraries § Debuggers § Profilers § Job schedulers Mix of open source and commercial products and applications…
  • 10. ©ARM 201710 e e e Released Planned Concept 2016 Hardware Open-Source software ARM HPC tools ARM HPC ecosystem roadmap 2017 Future Cavium ThunderX Allinea DDT and MAP Rogue Wave TotalView AppliedMicro X-Gene 1 & 2 Fujitsu – Post K (SVE) ISV software AMD Seattle Altair PBS Pro Cavium ThunderX2 Qualcomm Centriq NAG Library & Compiler ARM Optimized Routines GCC (gcc/g++/gfortran) LLVM - clang LLVM – Flang ISV software ARM Code Advisor (Beta) ARM Code Advisor (Full release) ARM Optimized Routines – vector versions Phytium Mars OpenHPC 1.2 ARM Performance Libraries PathScale ENZO ARM Fortran CompilerARM C/C++ Compiler – ahead of LLVM trunk ARM Instruction Emulator AppliedMicro X-Gene 3
  • 11. ©ARM 201711 e e e Open source in the ARM HPC ecosystem § Over the past 12 months many more packages and applications have been successfully ported to ARM HPC
  • 12. ©ARM 201712 e e e Linux / FreeBSD w/ AARCH64 support ß Engaged with FreeBSD foundation / Semi-half & Cavium to get FreeBSD on ARMv8 FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04 releases Red Hat Enterprise Linux Server for ARM 7.2 BETA – Sept, 2015 Fedora 22 released – May 2015 Fedora 23 released – Nov 2015 CentOS Linux 7 for AArch64 GA – August 2015 OpenSUSE 13.2 – Nov 2014 SUSE Launches Partner Program to Bring SUSE Linux Enterprise 12 to 64-bit ARM July 2015 @ ISC Debian 8 adds AARCH64 – April 2015
  • 13. ©ARM 201713 e e e HPC filesystems Software Status LUSTRE Ported HDFS Ported CEPH Ported BeeGFS Ported
  • 14. ©ARM 201714 e e e Workload and cluster managers Software Status IBM LSF Ported HP CMU Ported SLURM Ported Adaptive Computing (Moab) Future Altair PBSWorks Ported OpenLava (LSF port) Ported
  • 16. ©ARM 201716 e e e Open source and commercial compilers § LLVM § C, C++ § OpenMP 3.1, (4.0 coming soon) § Fortran coming Q1 2017 § PathScale § C, C++ Fortran § OpenACC § OpenMP 4.0 § NAG § Fortran § OpenMP 3.1 § GCC § C, C++, Fortran § OpenMP 4.0 § ARM C/C++ Compiler § LLVM based § Includes SVE
  • 17. ©ARM 201717 e e e e ARM C/C++ Compiler Commercially supported C/C++ compiler for Linux user-space HPC applications LLVM-based § ARM-on-ARM compiler § For application development (not bare-metal/embedded) Regularly pulls from upstream LLVM, adding: § SVE support in the assembler, disassembler, intrinsics and autovectorizer § Compiler Insights to support ARM Code Advisor OpenMP § Uses latest open source (now ARM-optimized) LLVM OpenMP runtime § Changes pushed back to the community
  • 18. ©ARM 201718 e e e ARM C/C++ Compiler – usage § To compile C code: § To compile C++ code: % armclang -O3 file.c –o file % armclang++ -O3 file.cpp –o file
  • 19. ©ARM 201719 e e e Common armclang options Flag Description --help Describes the most common options supported by ARM C/C++ Compiler --vsn --version Displays version information and license details -O<level> Specifies the level of optimization to use when compiling source files. The default is -O0 -c Performs the compilation step, but does not perform the link step. Produces an ELF object .o file. Run armclang again, passing in the object files to link -o <file> Specifies the name of the output file -fopenmp Use OpenMP -S Outputs assembly code, rather than object code. Produces a text .s file containing annotated assembly code
  • 20. ©ARM 201720 e e e LLVM OpenMP development § We have been contributing ARM-related upstream patches to LLVM’s ‘libomp’ § Example shown here is the Lulesh benchmark at size=80 running on 1-96 cores § GCC has slightly better serial performance § libomp demonstrates superior scaling § Even when used with GCC! 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 16 32 48 64 80 96 GCC + GOMP GCC + OMP LLVM + OMP Number of threads Zonespersecond
  • 21. ©ARM 201721 e e e Parallelism to enable optimal HPC performance § OpenMP § We are adding enhancements to the LLVM OpenMP implementation to get better AArch64 performance § ARM is active member of the OpenMP Standards Committee § OpenACC § PathScale and PGI are strong supporters of OpenACC § supported for ARM within ENZO § Auto-vectorization § ARM actively works on vectorization in GCC and LLVM, and encourages work with vectorization support in the compiler community. § PathScale’s compiler has vectorization support built in § MPI parallelism “just works” § Better Infiniband driver support is coming from Mellanox
  • 23. ©ARM 201723 e e e OpenHPC is a community effort to provide a common, verified set of open source packages for HPC deployments ARM’s participation: § Silver member of OpenHPC § ARM is on the OpenHPCTechnical Steering Committee in order to drive ARM build support Status: 1.2.0 release out now § All packages built on ARMv8 for CentOS and SUSE § ARM-based machines are being used for building and also in the OpenHPC build infrastructure Functional Areas Components include Base OS RHEL/CentOS 7.1, SLES 12 Administrative Tools Conman, Ganglia, Lmod, LosF, ORCM, Nagios, pdsh, prun Provisioning Warewulf Resource Mgmt. SLURM, Munge. Altair PBS Pro* I/O Services Lustre client (community version) Numerical/Scientifi c Libraries Boost, GSL, FFTW, Metis, PETSc, Trilinos, Hypre, SuperLU, Mumps I/O Libraries HDF5 (pHDF5), NetCDF (including C++ and Fortran interfaces), Adios Compiler Families GNU (gcc, g++, gfortran) MPI Families OpenMPI, MVAPICH2 Development Tools Autotools (autoconf, automake, libtool), Valgrind,R, SciPy/NumPy Performance Tools PAPI, Intel IMB, mpiP, pdtoolkit TAU – now on ARM
  • 24. ©ARM 201724 e e e Open source library AArch64 inbuilt tuning work ARM actively working with the community for increased support and performance § OpenBLAS § ARMv8 kernels included § BLIS § BLIS developers have close relationship with ARM § BLIS supports various ARM processors by default (e.g.ARM Cortex-A53, Cortex-A57 CPUs) § Also currently conducting ARM big.LITTLE development § ATLAS § Work ongoing with ARM Research team § Cortex-A57/A53 patches went into ATLAS § FFTW § Just works. NEON options built into v3.3.5
  • 25. ©ARM 201725 e e e Commercial library support Product availability for 64-bit ARMv8-A § ARM Performance Libraries § See following slides § NAG Library § Largest commercially available collection of numerical and statistical algorithms § > 1800 functions § Scales well, takes advantage of ARM Performance Libraries § Tested on Juno (ARM) and ThunderX (Cavium) § PathScale BLAS implementation § Comes with their ENZO compiler
  • 26. ©ARM 201726 e e e ARM Performance Libraries Commercial 64-bit ARMv8 math libraries § Commonly used low-level math routines - BLAS, LAPACK and FFT § Validated with NAG’s test suite, a de-facto standard Best-in-class performance with commercial support § Tuned by ARM for Cortex-A72, Cortex-A57 and Cortex-A53 § Maintained and Supported by ARM for a wide range of ARM-based SoCs § Regular benchmarking against open source alternatives Silicon partners can provide tuned micro-kernels for their SoCs § Partners can collaborate directly working with our source-code and test suite § Alternatively they can contribute through open source route Commercially Supported by ARM Validated with NAG test suite Performance on par with best-in-class math libraries
  • 27. ©ARM 201727 e e e e ARM Performance Libraries Version 2.0: Improving performance and interoperability BLAS § Enhanced parallelism for BLAS level 1 § Hand-tuned kernels for BLAS level 3 LAPACK (3.6.1) § Added PLASMA-style Directed Acyclic Graphs for better parallelism FFTW Interface § Support for FFTW-compatible basic and advanced DFT interfaces ScalableVector Extensions § Libraries built with SVE-capable compilers § Hand-written DGEMM and SGEMM kernels
  • 28. ©ARM 201728 e e e ARM Performance Libraries – micro-architecture § ARM cores have a variety of designs, created by both ARM and our partners § ARM Performance Libraries are creating tailored versions of routines to target these different micro-architectures § It is important to ensure that the correct version is installed on your system § For example consider the different performance in DGEMM running the Cortex-A53 and Cortex-A57 kernels on the right and wrong cores…
  • 29. ©ARM 201729 e e e ARM Performance Libraries – linking § In order to compile applications using BLAS, LAPACK and FFT routines from the ARM Performance libraries, four options are provided: § Serial and OpenMP builds § 32-bit and 64-bit integers § These translate into four binaries in /opt/arm/armpl-*/lib/ § libarmpl.a § libarmpl_mp.a § Shared libraries (libarmpl*.so) are also provided § Compile and link using, for example armclang -O3 file.c –fopenmp –c file.o –I${ARMPL_DIR}/include armclang -O3 file.o –fopenmp –o file –L${ARMPL_DIR}/lib –larmpl_mp § libarmpl_int64.a § libarmpl_int64_mp.a
  • 30. e e e ©ARM 2017 Scientific Computing on ARM Part 2 Chris Adeniyi-Jones Linaro Connect - Budapest Principal Engineer, Software and Large-Scale Systems ARM Research Thursday 9th March 2017
  • 32. ©ARM 201732 e e e Open source debugging tools § All the usual open source tools you would use for debugging work as expected on ARMv8 § printf() § gdb § valgrind (and its associated tools) § It is worth ensuring that the versions on your system are recent § Check Linaro webpages for up to date source
  • 33. ©ARM 201733 e e e § Common HPC debug environment § Active development for 30+ years § Thread specific breakpoints § Control individual thread execution § View thread specific stack and data § View complex data types easily § Track memory leaks in running applications § Supports C/C++ on Linux § Allowing the business to have § Predictable development schedules § Less time spent debugging § ARM support § Beta just concluded § Early access available November 2016 – version 2016.07 § Full release planned for Feb 2017 – version 2017.1 TotalView for HPC
  • 34. ©ARM 201734 e e e Allinea DDT Parallel stack view Automated data comparison: sparklines Parallel array searching Step, play, and breakpoints Offline debugging § Who had a rogue behaviour ? § Merges stacks from processes and threads § Where did it happen? § Allinea DDT leaps to source automatically § How did it happen? § Detailed error message given to the user § Some faults evident instantly from source § Why did it happen? § Unique “Smart Highlighting” § Sparklines comparing data across processes The debugger for C, C++ and Fortran threaded and parallel code
  • 35. ©ARM 201735 e e e ARM debugging gotcha § Weakly ordered memory access means that changes to memory can be applied in any order as long as single-core execution sees the data needed for program correctness § Most parallel HPC codes we encountered used GCC’s libgomp (-fopenmp) § These behaved correctly on AArch64 using GCC 5.2 § Some HPC codes we have come across have their own bespoke parallelization § Usually based directly on top of pthreads § Written to have more control over the threads of execution and how they synchronize § Problems are almost always down to a lock-free thread interaction implementation Thread 0 A=1; set=2; Thread 1 if (set==2) C = A; Weakly ordered memory model
  • 37. ©ARM 201737 e e e Performance hints § Use latest compiler version, not OS default § Use OMP_PROC_BIND=TRUE § Make sure you use OMP_PLACES if doing more than one run
  • 38. ©ARM 201738 e e e § Challenges § One (non-ARM) customer came to Allinea with help getting their code working better § Huge speed up on CONVERGE: from 2h down to 4 sec § Now possible to run jobs efficiently on hundreds of cores § Now possible to scale up from 2 million to 20 million nodes Full case study: http://www.allinea.com/news/201509/convergent-science- ignites-combustion-simulation-performance-allinea-forge Profiling – always important!
  • 39. ©ARM 201739 e e e Profiling/debugging tools Open source tools ported to ARM HPC and “just work” Tested applications include: • PAPI and Perf counters • Score-P, Cube, Scalasca • MPE and Jumpshot • TAU
  • 40. ©ARM 201740 e e e e ARM Code Advisor (Beta) Combines static and dynamic information to produce actionable insights Performance Advice § Compiler vectorization hints § Compilation flags advice § Fortran subarray warnings § OpenMP instrumentation Insights from compilation and runtime § Compiler Insights are embedded into the application binary by the ARM Compilers § OMPT interface used to instrument OpenMP runtime Extensible Architecture § Users can write plugins to add their own analysis information § Data accessible via web-browser, command-line, and REST API to support new user interfaces
  • 41. ©ARM 201741 e e e e ARM Code Advisor (Beta) Typical workflow Source Code Compile Compiled Binary +Insight Runtime Profile Profile Analyze Web View HTTP
  • 42. ©ARM 201742 e e e ARM Code Advisor – usage § Compile application with Insight functionality: § Run code: § Analyze collected data: § View analysis: % armclang++ -O3 –insight file.cpp –o file % armcadvisor collect ./example Starting collection of program [./example] to profile temp in /home/user1/armcadvisor-profiles % armcadvisor analyze example Generating analysis file, armcadvisor.advice % armcadvisor web –-network --everyone –p 2010 Open your browser to one of: http://server.arm.com:8080 http://127.0.0.1:8080
  • 43. ©ARM 201743 e e e ARM Code Advisor – video
  • 44. ©ARM 201744 e e e ARM Allinea MAP Low overhead measurement • Accurate, non-intrusive application performance profiling • Seamless – no recompilation or relinking required Easy to use • Source code viewer pinpoints bottleneck locations • Zoom in to explore iterations, functions and loops Deep • Measures CPU, communication, I/O and memory to identify problem causes • Identifies vectorization and cache performance
  • 45. ©ARM 201745 e e e caption Energy efficiency with ARM’s Allinea tools MACHINE ALLINEA ENERGY EFFICIENT APPLICATIONS
  • 46. ©ARM 201746 e e e Allinea Forge: what’s new in 7.0 – examples Lustre metrics in MAP Custom metrics in MAP PAPI metrics in PR
  • 47. ©ARM 201747 e e e Quantify gains immediately CPU RUN GPU RUN
  • 49. ©ARM 201749 e e e Introducing the ScalableVector Extension (SVE) A vector extension to the ARMv8-A architecture; its major new features: § Gather-load and scatter-store § Per-lane predication § Predicate-driven loop control and management § Vector partitioning and SW managed speculation § Extended integer and floating-point horizontal reductions SVE is not an extension of Advanced SIMD § A separate architectural extension with a new set of A64 instruction encodings § Focus is HPC scientific workloads, not media/image processing
  • 50. ©ARM 201750 e e e What’s the vector length? § There is no preferred vector length § Vector Length (VL) is hardware choice, from 128 to 2048 bits, in increments of 128 § Does not need to be a power-of-2 § Vector Length Agnostic programming adjusts dynamically to the availableVL § No need to recompile, or to rewrite hand- coded SVE assembler or C intrinsics § Has extensive implications for loop optimizations VL 128b 0 64 256b 0 128 384b 0 128 256 128 256 384 512b 0 128 256 384 512
  • 51. ©ARM 201751 e e e SVE Compiler status § ARM C/C++ Compiler has SVE capabilities built in § We have a public snapshot of our LLVM changes § https://github.com/ARM-Software/LLVM-SVE § These are being incrementally upstreamed into the main LLVM codebase § We have started upstreaming our GCC changes § Changes to GNU binutils are already upstream § For more details of SVE usage see our blogs and whitepapers § https://www.community.arm.com/processors/b/blog/posts/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture § https://cms.developer.arm.com/-/media/developer/developers/hpc/white-papers/a-sneak-peek-into-sve-and-vla-programming.pdf
  • 52. ©ARM 201752 e e e Compiling with SVE § To compile C code: § To compile C++ code: % armclang -O3 -march=armv8-a+sve file.c –o file % armclang++ -O3 -march=armv8-a+sve file.cpp –o file
  • 53. ©ARM 201753 e e e e ARM Instruction Emulator Run SVE binaries at near native speed on existing ARMv8-A hardware Trap-and-emulate of illegal userspace instructions § Natively supported instructions run at full speed § Unsupported instructions are faithfully emulated in software Full integration with ARM Code Advisor § Plugin allows ARM Instruction Emulator to provide hotspot information and other metrics § Command-line integration allows ARM Code Advisor workflows to seamlessly integrate with ARM Instruction Emulator ARMV8-A Linux Converts unsupported SVE instructions to native ARMv8-A instructions ARMv8 Binary ARMv8 SVE Binary ARM Instruction Emulator
  • 54. ©ARM 201754 e e e Running with ARM Instruction Emulator (1) § To run SVE compiled code: § To list valid vector lengths: % armie --vector-length 256 ./example % armie --list-vectors 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
  • 55. ©ARM 201755 e e e ARM Code Advisor with SVE § Compile code with Insight as before § Run the executable using both ARM Code Advisor and ARM Instruction Emulator § Analysis and visualization stages as before % armcadvisor collect --armie --vector-length 256 “./lulesh2.0 -s 15” % armclang -O3 -march=armv8-a+sve -insight file.c –o file
  • 57. ©ARM 201757 e e e ARM HPC in 2017 § The software ecosystem has matured significantly in the past two years § Commercial compilers, libraries, debuggers and profilers are all now available to complement open source projects § Users will notice very few hurdles to migrating codes as programming environments are very familiar § ARM HPC hardware continues to appear from many partners with small proof of concept systems turning into bigger systems § SVE machines will only enhance performance further
  • 58. ©ARM 201758 e e e Developer website : arm.com/hpc ARM launched an HPC-specific microsite – home to our HPC ecosystem offering: • Technical reference material • How-to guides • Latest news and updates from partners • Links to downloads of HPC libraries • Third-party software recommendations • Web forum for community discussion and help Participate and help drive the community
  • 59. ©ARM 201759 e e e e ARM HPC Products from arm.com/hpc Supported, Integrated and Performant, available now ARM Compiler for HPC ARM SVE Compiler for HPC ARM Code Advisor Beta § ARM C/C++ Compiler § ARM Performance Libraries § GCC 6.0 § ARM C/C++ SVE Compiler § ARM SVE Performance Libraries § GCC 6.0 § ARM Instruction Emulator § GCC for SVE with LLVM OpenMP § ARM Code Advisor § ARM C/C++ Compiler § gfortran with LLVM OpenMP for ARM Code Advisor
  • 60. ©ARM 201760 e e e Summary of today § Overview of the state of the ARM HPC Ecosystem § Hands-on experience running codes on a selection of ARM-based hardware § Introduced to commercial HPC tools developed by ARM and ecosystem partners § Visit arm.com/hpc for more information and tool access
  • 61. The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. Copyright © 2017 ARM Limited ©ARM 2017 http://www.arm.com/hpc Contact: hpc@arm.com
  • 63. ©ARM 201763 e e e Session 1 Using Linaro Developer Cloud instances Username Password Work-Node ssh user00@64.28.99.111 # Login to the login server ssh node-0 # then login to a worker node Files describing the hands-on are in your home directory: session_1.txt session_1.pdf Run this on your local machine to copy the session info file: scp user00@64.28.99.111:/home/user00/session_1.pdf session_1.pdf
  • 64. ©ARM 201764 e e e Using Linaro Developer Cloud instances Example Username Password Work-Node ssh user00@64.28.99.111 # Login to the login server ssh node-0 # then login to a worker node Files describing the hands-on are in you home directory: session_2.txt session_2.pdf Run this on your local machine to copy the session info file: scp user00@64.28.99.111:/home/user00/session_2.pdf session_2.pdf Session 2