SlideShare a Scribd company logo
1 of 35
Download to read offline
©ARM 20171
13th ANNUAL WORKSHOP 2017
RDMA on ARM
Pavel Shamis (Pasha), Principal Research Engineer
March, 2017
ARM
©ARM 20172
RDMA on ARM ?
©ARM 20173
ARM Servers from multiple manufacturers
©ARM 20174
ARM Overview
©ARM 20175
An introduction to ARM
ARM is the world's leading semiconductor
intellectual property supplier.
We license to over 350 partners, are present in 95% of smart
phones, 80% of digital cameras, 35% of all electronic devices, and a
total of 60 billion ARM cores have been shipped since 1990.
Our CPU business model:
License technology to partners, who use it to create their own
system-on-chip (SoC) products.
 We may license an instruction set architecture (ISA) such as
“ARMv8-A”)
 or a specific implementation, such as “Cortex-A72”.
 Partners who license an ISA can create their own
implementation, as long as it passes the compliance tests.
…and our IP extends beyond the CPU
©ARM 20176
A business model that shares success
 Everyone in the value chain benefits
 Long term sustainability
Design once and reuse is fundamental
 Spread the cost amongst many partners
 Technology reused across multiple applications
 Creates market for ecosystem to target
 Re-use is also fundamental to the ecosystem
Upfront license fee
 Covers the development cost
Ongoing royalties
 Typically based on a percentage of chip price
 Vested interest in success of customers
A partnership business model
Business Development
Licence fee
IP
Provider
SemiCo
Partner
ARM Licenses
technology to
Partner
Partners
develop
chips
OEM
Customer
OEM sells
consumer
products
Royalty
Approximately 1350 licenses
Grows by ~120 every year
More than 440 potential
royalty payers
14.8bn+ ARM-powered chips in
2015
©ARM 20177
Partnership
©ARM 20178
Range of SoCs addressing infrastructure
Highly Accelerated Massively Multicore
One size does not fit all
QorIQ®
Layerscape
2080A
©ARM 20179
Serious ARM HPC deployments starting in 2017
Two big announcements about ARM in HPC in Europe:
©ARM 201710
Japan
slides from Fujitsu at ISC’16
©ARM 201711
Software Eco-system
©ARM 201712
Released
Planned
Concept
2016
Hardware
Open-Source
software
ARM HPC
tools
ARM HPC ecosystem roadmap
2017 Future
Cavium ThunderX
Allinea DDT and MAP
Rogue Wave TotalView
AppliedMicro X-Gene 1 & 2
Fujitsu – Post K (SVE)
ISV software
AMD Seattle
Altair PBS Pro
Cavium ThunderX2
Qualcomm Centriq
NAG Library & Compiler
ARM Optimized Routines
GCC (gcc/g++/gfortran)
LLVM - clang LLVM – Flang
ISV software
ARM Code Advisor (Beta) ARM Code Advisor (Full release)
ARM Optimized Routines – vector versions
Phytium Mars
OpenHPC 1.2
ARM Performance Libraries
PathScale ENZO
ARM Fortran CompilerARM C/C++ Compiler – ahead of LLVM trunk
ARM Instruction Emulator
AppliedMicro X-Gene 3
©ARM 201713
Linux / FreeBSD w/ AARCH64 support
 Engaged with FreeBSD foundation / Semi-half & Cavium to get FreeBSD on ARMv8
FreeBSD Beta version demo’d by Semihalf – Nov. 2015
12.04LTS & 14.04LTS
released
 Also 14.10 & 15.04 releases
Red Hat Enterprise Linux Server for ARM
7.2 BETA – Sept, 2015
Fedora 22 released – May 2015
Fedora 23 released – Nov 2015 CentOS Linux 7 for AArch64
GA – August 2015
OpenSUSE 13.2 – Nov 2014 SUSE Launches Partner Program to Bring
SUSE Linux Enterprise 12 to 64-bit ARM
July 2015 @ ISC
Debian 8 adds AARCH64 – April 2015
©ARM 201714
Open source and commercial compilers
 LLVM
 C, C++
 OpenMP 3.1, (4.0 coming soon)
 Fortran coming Q1 2017
 PathScale
 C, C++ Fortran
 OpenACC
 OpenMP 4.0
 NAG
 Fortran
 OpenMP 3.1
 GCC
 C, C++, Fortran
 OpenMP 4.0
 ARM C/C++ Compiler
 LLVM based
 Includes SVE
©ARM 201715
RESEARCH & DEVELOPMENT
©ARM 201716
RDMA
©ARM 201717
RDMA Support
 Mellanox OFED 2.4 and above supports ARM
 Linux Kernel 4.5.0 and above (maybe even earlier)
 OFED – No support
 Linux Distribution – on going process
©ARM 201718
Lessons Learned
 Memory Barriers
 Multithread environment
 Software-hardware interaction
 Examples https://github.com/openucx/ucx/blob/master/src/ucs/arch/aarch64/cpu.h#L25
 You can “fish” for these bugs in MPI implementations around Eager-RDMA and shared
memory protocols
Write Payload
Write Notify
Busy-wait Read Notify
Read Payload
Write Barrier Read Barrier
RDMA
RDMA
Maranget, Luc, Susmit Sarkar, and Peter Sewell. "A tutorial introduction to the ARM and POWER relaxed memory models." Draft available from
http://www.cl. cam. ac. uk/~ pes20/ppc-supplemental/test7. pdf (2012).
©ARM 201719
Lessons Learned - continued
 Low-level timers
 Typically found in benchmarks and MPI
 Code examples https://github.com/openucx/ucx/blob/master/src/ucs/arch/aarch64/cpu.h#L35
©ARM 201720
Lessons Learned – continued
 Not all cache-lines are 64Byte !
 Implementation dependent
 128Byte and 64Byte
http://xeroxnostalgia.com/duplicators/xerox-9200/
©ARM 201721
Optimizations
 AVX => Neon
 Mostly found around communication request initialization codes
https://github.com/openucx/ucx/blob/891e20ef90257d1e2721da52461b0261220c82d8/src/uct/
ib/mlx5/ib_mlx5.inl#L160
 Busy-wait loop
 SeeWait-For-Event (WFE)
Pavel Shamis, M. Graham Lopez, and Gilad Shainer.“Enabling One-sided Communication Semantics on ARM“
©ARM 201722
Preliminary Results
©ARM 201723
Testbed
• 2 x Softiron Overdrive 3000 servers with AMD Opteron A1100 / 2GHz
• ConnectX-4 IB/VPI EDR (PCIe g2 x8)
• Ubuntu 16.04
• MOFED 3.3-1.5.0.0
• UCX [0558b41]
• XPMEM [bdfcc52]
• OSHMEM/OPEN-MPI [fed4849]
©ARM 201724
OpenUCX and OpenSHMEM on ARM
HPC Applications
OpenSHMEM
UCX SPML
UCX
Framework
UCP
MLX5 UCT Verbs UCT XPMEM UCT
©ARM 201725
OpenUCX with InfiniBand
40%
Pavel Shamis, M. Graham Lopez, and Gilad Shainer.“Enabling One-sided Communication Semantics on ARM“
©ARM 201726
SHMEM_WAIT()
73%
35%
Pavel Shamis, M. Graham Lopez, and Gilad Shainer.“Enabling One-sided Communication Semantics on ARM“
©ARM 201727
OpenSHMEM SSCA
Pavel Shamis, M. Graham Lopez, and Gilad Shainer.“Enabling One-sided Communication Semantics on ARM“
7-30%
©ARM 201728
OpenSHMEM GUPs
Pavel Shamis, M. Graham Lopez, and Gilad Shainer.“Enabling One-sided Communication Semantics on ARM“
21%
©ARM 201729
Summary
 Linux RDMA community is doing great job !
 A lot of progress was made in ARM HPC/server software
eco-system
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited
(or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be
trademarks of their respective owners.
Copyright © 2017 ARM Limited
©ARM 2017
©ARM 201731
Backup
©ARM 201732
CCIX
 Accelerators and Network (NIC/HCA/etc.) as a first
class “citizen” in the system
 Seamless process and accelerator hardware cache
coherence support
 Low-latency and high-bandwidth
 Allow in-line acceleration
 Bump in the wire processing (network packet processing, storage
acceleration, etc.)
 Allows “off-line” acceleration (co-processor model)
 Driver-less / interrupt-less usage model
 http://www.ccixconsortium.com
©ARM 201733
CCIX
CCIX Enables System Connectivity For All Elements (CPU, GPU, IO, FPGA)
©ARM 201734
Gen-Z
 All data is accessed by some form of a Read or a Write
 Example of reads: DDR Row + Column Read, PCI DMA Read, SCSIWrite,
Socket Read, File Read, RDMA Read
 Example of writes: DDR Row + Column Write, PCI DMA Write, SCSI Read,
SocketWrite, FileWrite, RDMA Write
 The Goal: Simplify world to memory semantic Reads &Writes
©ARM 201735
Gen-Z Overview
 An open, standards-based, scalable, system interconnect and protocol.
 Optimized to support memory semantic communications
 Breaks Processor-Memory Interlock
 Split controller model
 Memory controller
 Initiates high-level requests—Read,Write,Atomic, Put / Get, etc.
 Enforces ordering, reliability, path selection, etc.
 Media controller
 Abstracts memory media
 Supports volatile / non-volatile / mixed-media
 Performs media-specific operations
 Executes requests and returns responses
 Enables data-centric computing (accelerator, compute, etc.)

More Related Content

What's hot

ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortNAVER D2
 
HPE Data Protector Best Practice Guide
HPE Data Protector Best Practice GuideHPE Data Protector Best Practice Guide
HPE Data Protector Best Practice GuideAndrey Karpov
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionStreamNative
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_FinalGopi Krishnamurthy
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowJulien Le Dem
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsKaran Singh
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufVerverica
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsDavid Portnoy
 
Implementation & Comparison Of Rdma Over Ethernet
Implementation & Comparison Of Rdma Over EthernetImplementation & Comparison Of Rdma Over Ethernet
Implementation & Comparison Of Rdma Over EthernetJames Wernicke
 
LAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg developmentLAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg developmentLinaro
 
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...Linaro
 
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...AMD Developer Central
 

What's hot (20)

ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
HPE Data Protector Best Practice Guide
HPE Data Protector Best Practice GuideHPE Data Protector Best Practice Guide
HPE Data Protector Best Practice Guide
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless Evolution
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
librados
libradoslibrados
librados
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
NetApp & Storage fundamentals
NetApp & Storage fundamentalsNetApp & Storage fundamentals
NetApp & Storage fundamentals
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
Tungsten Fabric and DPDK vRouter Architecture
Tungsten Fabric and DPDK vRouter ArchitectureTungsten Fabric and DPDK vRouter Architecture
Tungsten Fabric and DPDK vRouter Architecture
 
Implementation & Comparison Of Rdma Over Ethernet
Implementation & Comparison Of Rdma Over EthernetImplementation & Comparison Of Rdma Over Ethernet
Implementation & Comparison Of Rdma Over Ethernet
 
LAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg developmentLAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg development
 
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
 
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
 

Similar to RDMA on ARM

Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...Arm
 
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesEfficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesArm
 
Arm as a Viable Architecture for HPC and AI
Arm as a Viable Architecture for HPC and AIArm as a Viable Architecture for HPC and AI
Arm as a Viable Architecture for HPC and AIinside-BigData.com
 
Arm - ceph on arm update
Arm - ceph on arm updateArm - ceph on arm update
Arm - ceph on arm updateinwin stack
 
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based MultiprocessingArm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based MultiprocessingArm
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Eric Van Hensbergen
 
Arm based controller - basic bootcamp
Arm based controller - basic bootcampArm based controller - basic bootcamp
Arm based controller - basic bootcampRoy Messinger
 
ARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introductionARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introductionanand hd
 
How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?Hannes Tschofenig
 
SOMNIUM DRT C++ tools for Microchip ARM microcontrollers
SOMNIUM DRT C++ tools for Microchip ARM microcontrollersSOMNIUM DRT C++ tools for Microchip ARM microcontrollers
SOMNIUM DRT C++ tools for Microchip ARM microcontrollersDaniel O'Hara
 
OFI Overview 2019 Webinar
OFI Overview 2019 WebinarOFI Overview 2019 Webinar
OFI Overview 2019 Webinarseanhefty
 
An Update on the European Processor Initiative
An Update on the European Processor InitiativeAn Update on the European Processor Initiative
An Update on the European Processor Initiativeinside-BigData.com
 
LCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 PlenaryLCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 PlenaryLinaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
LAS16-112: mbed OS Technical Overview
LAS16-112: mbed OS Technical OverviewLAS16-112: mbed OS Technical Overview
LAS16-112: mbed OS Technical OverviewLinaro
 

Similar to RDMA on ARM (20)

Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
 
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesEfficient software development with heterogeneous devices
Efficient software development with heterogeneous devices
 
Arm in HPC
Arm in HPCArm in HPC
Arm in HPC
 
ARM HPC Ecosystem
ARM HPC EcosystemARM HPC Ecosystem
ARM HPC Ecosystem
 
Arm as a Viable Architecture for HPC and AI
Arm as a Viable Architecture for HPC and AIArm as a Viable Architecture for HPC and AI
Arm as a Viable Architecture for HPC and AI
 
Arm - ceph on arm update
Arm - ceph on arm updateArm - ceph on arm update
Arm - ceph on arm update
 
HPC Network Stack on ARM
HPC Network Stack on ARMHPC Network Stack on ARM
HPC Network Stack on ARM
 
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based MultiprocessingArm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
Arm based controller - basic bootcamp
Arm based controller - basic bootcampArm based controller - basic bootcamp
Arm based controller - basic bootcamp
 
ARM.pdf
ARM.pdfARM.pdf
ARM.pdf
 
ARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introductionARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introduction
 
How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?
 
SOMNIUM DRT C++ tools for Microchip ARM microcontrollers
SOMNIUM DRT C++ tools for Microchip ARM microcontrollersSOMNIUM DRT C++ tools for Microchip ARM microcontrollers
SOMNIUM DRT C++ tools for Microchip ARM microcontrollers
 
OFI Overview 2019 Webinar
OFI Overview 2019 WebinarOFI Overview 2019 Webinar
OFI Overview 2019 Webinar
 
An Update on the European Processor Initiative
An Update on the European Processor InitiativeAn Update on the European Processor Initiative
An Update on the European Processor Initiative
 
LCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 PlenaryLCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 Plenary
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
ARM Processor Tutorial
ARM Processor Tutorial ARM Processor Tutorial
ARM Processor Tutorial
 
LAS16-112: mbed OS Technical Overview
LAS16-112: mbed OS Technical OverviewLAS16-112: mbed OS Technical Overview
LAS16-112: mbed OS Technical Overview
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

RDMA on ARM

  • 1. ©ARM 20171 13th ANNUAL WORKSHOP 2017 RDMA on ARM Pavel Shamis (Pasha), Principal Research Engineer March, 2017 ARM
  • 3. ©ARM 20173 ARM Servers from multiple manufacturers
  • 5. ©ARM 20175 An introduction to ARM ARM is the world's leading semiconductor intellectual property supplier. We license to over 350 partners, are present in 95% of smart phones, 80% of digital cameras, 35% of all electronic devices, and a total of 60 billion ARM cores have been shipped since 1990. Our CPU business model: License technology to partners, who use it to create their own system-on-chip (SoC) products.  We may license an instruction set architecture (ISA) such as “ARMv8-A”)  or a specific implementation, such as “Cortex-A72”.  Partners who license an ISA can create their own implementation, as long as it passes the compliance tests. …and our IP extends beyond the CPU
  • 6. ©ARM 20176 A business model that shares success  Everyone in the value chain benefits  Long term sustainability Design once and reuse is fundamental  Spread the cost amongst many partners  Technology reused across multiple applications  Creates market for ecosystem to target  Re-use is also fundamental to the ecosystem Upfront license fee  Covers the development cost Ongoing royalties  Typically based on a percentage of chip price  Vested interest in success of customers A partnership business model Business Development Licence fee IP Provider SemiCo Partner ARM Licenses technology to Partner Partners develop chips OEM Customer OEM sells consumer products Royalty Approximately 1350 licenses Grows by ~120 every year More than 440 potential royalty payers 14.8bn+ ARM-powered chips in 2015
  • 8. ©ARM 20178 Range of SoCs addressing infrastructure Highly Accelerated Massively Multicore One size does not fit all QorIQ® Layerscape 2080A
  • 9. ©ARM 20179 Serious ARM HPC deployments starting in 2017 Two big announcements about ARM in HPC in Europe:
  • 10. ©ARM 201710 Japan slides from Fujitsu at ISC’16
  • 12. ©ARM 201712 Released Planned Concept 2016 Hardware Open-Source software ARM HPC tools ARM HPC ecosystem roadmap 2017 Future Cavium ThunderX Allinea DDT and MAP Rogue Wave TotalView AppliedMicro X-Gene 1 & 2 Fujitsu – Post K (SVE) ISV software AMD Seattle Altair PBS Pro Cavium ThunderX2 Qualcomm Centriq NAG Library & Compiler ARM Optimized Routines GCC (gcc/g++/gfortran) LLVM - clang LLVM – Flang ISV software ARM Code Advisor (Beta) ARM Code Advisor (Full release) ARM Optimized Routines – vector versions Phytium Mars OpenHPC 1.2 ARM Performance Libraries PathScale ENZO ARM Fortran CompilerARM C/C++ Compiler – ahead of LLVM trunk ARM Instruction Emulator AppliedMicro X-Gene 3
  • 13. ©ARM 201713 Linux / FreeBSD w/ AARCH64 support  Engaged with FreeBSD foundation / Semi-half & Cavium to get FreeBSD on ARMv8 FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released  Also 14.10 & 15.04 releases Red Hat Enterprise Linux Server for ARM 7.2 BETA – Sept, 2015 Fedora 22 released – May 2015 Fedora 23 released – Nov 2015 CentOS Linux 7 for AArch64 GA – August 2015 OpenSUSE 13.2 – Nov 2014 SUSE Launches Partner Program to Bring SUSE Linux Enterprise 12 to 64-bit ARM July 2015 @ ISC Debian 8 adds AARCH64 – April 2015
  • 14. ©ARM 201714 Open source and commercial compilers  LLVM  C, C++  OpenMP 3.1, (4.0 coming soon)  Fortran coming Q1 2017  PathScale  C, C++ Fortran  OpenACC  OpenMP 4.0  NAG  Fortran  OpenMP 3.1  GCC  C, C++, Fortran  OpenMP 4.0  ARM C/C++ Compiler  LLVM based  Includes SVE
  • 15. ©ARM 201715 RESEARCH & DEVELOPMENT
  • 17. ©ARM 201717 RDMA Support  Mellanox OFED 2.4 and above supports ARM  Linux Kernel 4.5.0 and above (maybe even earlier)  OFED – No support  Linux Distribution – on going process
  • 18. ©ARM 201718 Lessons Learned  Memory Barriers  Multithread environment  Software-hardware interaction  Examples https://github.com/openucx/ucx/blob/master/src/ucs/arch/aarch64/cpu.h#L25  You can “fish” for these bugs in MPI implementations around Eager-RDMA and shared memory protocols Write Payload Write Notify Busy-wait Read Notify Read Payload Write Barrier Read Barrier RDMA RDMA Maranget, Luc, Susmit Sarkar, and Peter Sewell. "A tutorial introduction to the ARM and POWER relaxed memory models." Draft available from http://www.cl. cam. ac. uk/~ pes20/ppc-supplemental/test7. pdf (2012).
  • 19. ©ARM 201719 Lessons Learned - continued  Low-level timers  Typically found in benchmarks and MPI  Code examples https://github.com/openucx/ucx/blob/master/src/ucs/arch/aarch64/cpu.h#L35
  • 20. ©ARM 201720 Lessons Learned – continued  Not all cache-lines are 64Byte !  Implementation dependent  128Byte and 64Byte http://xeroxnostalgia.com/duplicators/xerox-9200/
  • 21. ©ARM 201721 Optimizations  AVX => Neon  Mostly found around communication request initialization codes https://github.com/openucx/ucx/blob/891e20ef90257d1e2721da52461b0261220c82d8/src/uct/ ib/mlx5/ib_mlx5.inl#L160  Busy-wait loop  SeeWait-For-Event (WFE) Pavel Shamis, M. Graham Lopez, and Gilad Shainer.“Enabling One-sided Communication Semantics on ARM“
  • 23. ©ARM 201723 Testbed • 2 x Softiron Overdrive 3000 servers with AMD Opteron A1100 / 2GHz • ConnectX-4 IB/VPI EDR (PCIe g2 x8) • Ubuntu 16.04 • MOFED 3.3-1.5.0.0 • UCX [0558b41] • XPMEM [bdfcc52] • OSHMEM/OPEN-MPI [fed4849]
  • 24. ©ARM 201724 OpenUCX and OpenSHMEM on ARM HPC Applications OpenSHMEM UCX SPML UCX Framework UCP MLX5 UCT Verbs UCT XPMEM UCT
  • 25. ©ARM 201725 OpenUCX with InfiniBand 40% Pavel Shamis, M. Graham Lopez, and Gilad Shainer.“Enabling One-sided Communication Semantics on ARM“
  • 26. ©ARM 201726 SHMEM_WAIT() 73% 35% Pavel Shamis, M. Graham Lopez, and Gilad Shainer.“Enabling One-sided Communication Semantics on ARM“
  • 27. ©ARM 201727 OpenSHMEM SSCA Pavel Shamis, M. Graham Lopez, and Gilad Shainer.“Enabling One-sided Communication Semantics on ARM“ 7-30%
  • 28. ©ARM 201728 OpenSHMEM GUPs Pavel Shamis, M. Graham Lopez, and Gilad Shainer.“Enabling One-sided Communication Semantics on ARM“ 21%
  • 29. ©ARM 201729 Summary  Linux RDMA community is doing great job !  A lot of progress was made in ARM HPC/server software eco-system
  • 30. The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. Copyright © 2017 ARM Limited ©ARM 2017
  • 32. ©ARM 201732 CCIX  Accelerators and Network (NIC/HCA/etc.) as a first class “citizen” in the system  Seamless process and accelerator hardware cache coherence support  Low-latency and high-bandwidth  Allow in-line acceleration  Bump in the wire processing (network packet processing, storage acceleration, etc.)  Allows “off-line” acceleration (co-processor model)  Driver-less / interrupt-less usage model  http://www.ccixconsortium.com
  • 33. ©ARM 201733 CCIX CCIX Enables System Connectivity For All Elements (CPU, GPU, IO, FPGA)
  • 34. ©ARM 201734 Gen-Z  All data is accessed by some form of a Read or a Write  Example of reads: DDR Row + Column Read, PCI DMA Read, SCSIWrite, Socket Read, File Read, RDMA Read  Example of writes: DDR Row + Column Write, PCI DMA Write, SCSI Read, SocketWrite, FileWrite, RDMA Write  The Goal: Simplify world to memory semantic Reads &Writes
  • 35. ©ARM 201735 Gen-Z Overview  An open, standards-based, scalable, system interconnect and protocol.  Optimized to support memory semantic communications  Breaks Processor-Memory Interlock  Split controller model  Memory controller  Initiates high-level requests—Read,Write,Atomic, Put / Get, etc.  Enforces ordering, reliability, path selection, etc.  Media controller  Abstracts memory media  Supports volatile / non-volatile / mixed-media  Performs media-specific operations  Executes requests and returns responses  Enables data-centric computing (accelerator, compute, etc.)