SlideShare a Scribd company logo
1 of 22
GCC for ARMv8 Aarch64
2014
issue.hsu@gmail.com
New features
• Load-acquire and store-release atomics
• AdvSIMD usable for general purpose float math
• Larger PC-relative addressing and branching
• Literal pool access and most conditional branches are extended to ±
1MB, unconditional branches and calls to ±128MB
• Non-temporal (cache skipping) load/store
• Load/store of a non-contiguous pair of registers
2
Registers
• 64 Bit integer registers:
– X0 ~ X29, X30/LR, SP/ZERO
• Only register with special semantics is 31, which acts as both stack
pointer and a zero register
– Zero register
• When used as a source register, and discards the result when used as
destination register
– Stack pointer
• When used as a load/store base register
• Some arithmetic instructions
• X30/LR for procedure call link register is unbanked, exception save
restart PC to the target exception level’s ELR system register
3
Registers (cont)
• Bottom 32 bits of the registers are referred as W0 .. W30
• Benefits
– Easier to do 64-bit arithmetic!
– Less need to spill to the stack
– Spare registers to keep more temporaries
4
Structure Layout
5
struct foo {
int32_t a;
void* p;
int32_t x;
};
32-bit 64-bit 64-bit
struct foo {
void* p;
int32_t a;
int32_t x;
};
Data models
• ARM targeted two data models for the 64-bit mode, to address the
key OS partners
– The first is LP64, where integers are 32-bit, and long integers are 64-bit, which is
used by Linux, most UNIXes and OS X
– The other is LLP64, where integers and long integers are 32-bit, while long long
integers are 64-bit, and favored by Microsoft Windows
• -mabi=name
– Generate code for the specified data model.
– Permissible values are ‘ilp32’ for SysV-like data model where int, long int and
pointer are 32-bit, and ‘lp64’ for SysV-like data model where int is 32-bit, but long
int and pointer are 64-bit.
– The default depends on the specific target configuration. Note that the LP64 and
ILP32 ABIs are not link-compatible; you must compile your entire program with the
same ABI, and link with a compatible set of libraries.
6
Reference
http://www.unix.org/version2/whatsnew/lp64_wp.html
http://www.realworldtech.com/arm64/2/
http://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
Data models (cont)
7
struct foo {
int a;
long l;
int x;
};
Reference
http://www.linaro.org/assets/common/campus-party-
presentation-Sept_2013.pdf
Conditional instructions
• Instructions are unconditionally executed but use the condition flags
as an extra input to the instruction
– Conditional branch
• CBZ, B.cond
– Add/subtract with carry
• ADC, SBC
– Conditional compare
• CCMP
– Conditional select/set with increment, negate or invert
• Benchmarking reveals these to be the highest frequency used of single
conditional instructions
• CSEL, CSET
8
Addressing features (cont)
• Register indexed addressing
– Allowing a 64-bit index register to be added to 64-bit base register
– Providing sign or zero extension of 32-bit value within an index register
• PC relative addressing
– PC-relative literal loads have an offset range of ±1MB. This permits fewer literal
pools, and more sharing of literal data between functions – reducing I-cache and
TLB pollution
– Most conditional branches have a range of ±1MiB, expected to be sufficient for
the majority of conditional branches which take place within a single function
– Unconditional branches, including branch and link, have a range of ±128MiB.
Expected to be sufficient to span the static code segment of most executable load
modules and shared objects, without needing linker-inserted trampolines or
“veneers”
– PC-relative load/store and address generation with a range of ±4GiB may be
performed inline using only two instructions, i.e. without the need to load an offset
from a literal pool
9
An example for global variable access
10
extern int gVar;
int main(void)
{
return gVar;
}
.arch armv7-a
.text
.align 2
.global main
.type main, %function
main:
movw r3, #:lower16:gVar
movt r3, #:upper16:gVar
ldr r0, [r3, #0]
bx lr
.arch armv5te
.text
.align 2
.global main
.type main, %function
main:
ldr r3, .L3
ldr r0, [r3, #0]
bx lr
.L4:
.align 2
.L3:
.word gVar
.arch armv8-a+fp+simd
.section .text.startup
.align 2
.global main
.type main, %function
main:
adrp x0, gVar
ldr w0, [x0,#:lo12:gVar]
ret
Address Generation
• ADRP Xd, label
– Address of Page
– Sign extends a 21-bit offset, shifts it left by 12 and adds it to the value of the PC
with its bottom 12 bits cleared, writing the result to register Xd
– This computes the base address of the 4KB aligned memory region containing
label, and is designed to be used in conjunction with a load, store or ADD
instruction which supplies the bottom 12 bits of the label’s address
– This permits position-independent addressing of any location within ±4GB of the
PC using two instructions, providing that dynamic relocation is done with a
minimum granularity of 4KB
– The term “page” is short-hand for the 4KB relocation granule, and is not
necessarily related to the virtual memory page size
11
Address Generation (cont)
• ADR Xd, label
– Address
– Adds a 21-bit signed byte offset to the program counter, writing the result to
register Xd
– Used to compute the effective address of any location within ±1MiB of the PC
12
The program counter (PC)
• Cannot be used in arithmetic and load/store instructions
• Instructions that implicitly read PC
– PC relative address compute instructions
• ADR, ADRP, literal load, direct branch
• Its value is the address of the instruction, there is no implied offset of 4 or 8
bytes
– Branch-and-link instructions
• BL, BLR, will store PC to link register
• Instructions to implicitly modify PC
– Explicit control flow instructions
• [Un]conditional branch, exception generation, exception return instructions
13
Memory Load-Store
• Bulk transfers
– LDM, STM, PUSH, POP do not exist in Aarch64
– LDP, STP that load and store a pair of independent registers from consecutive
memory locations, which support unaligned addresses when accessing normal
memory
– LDNP, STNP provide a streaming or non-temporal hint that data does not need to
be retained in caches
• A special exception to the normal memory ordering rules, where an address dependency
exists between two memory reads and the second read was generated by a LDNP then, in
the absence of any other barrier mechanism to achieve order, those memory accesses can
be observed in any order by other observers within the shareability domain of the memory
addresses being accessed.
14
Memory Load-Store (cont)
• Exclusive accesses
– LDXR, LDXP, STXR, STXP
– Exclusive access to a pair of double words permit atomic updates of a pair of
pointers
– Must be naturally aligned, exclusive pair access must be aligned to twice the data
size
• Load-acquire, store-release
– LDAR, STLR, LDAXR, STLXR
– Explicitly synchronizing load and store instructions (release-consistency memory
model)
– Reducing the need for explicit memory barriers
– Require natural address alignment
15
Memory Load-Store (cont)
• Prefetch Memory
– Support following addressing modes:
• Base plus a scaled 12-bit unsigned immediate offset or base plus an unscaled 9-bit signed
immediate offset
• Base plus a 64-bit register offset. This can be optionally scaled by 8-bits, for example
LSL#3.
• Base plus a 32-bit extended register offset. This can be optionally scaled by 8-bits.
• PC-relative literal.
– PRFM <prfop>, addr | label
• <prfop> is defined as <type><target><policy>
• <type>: PLD (prefetch for load), PST (prefetch for store), PLI (preload instructions)
• <target>: L1 (level 1 cache), L2 (level 2 cache), L3 (level 3 cache)
• <policy>
– KEEP: Retained or temporal prefetch, allocated in the cache normally
– STRM: Streaming or non-temporal prefetch, for data that is used only once
• PLDL1KEEP, PSTL2STRM, PLIL3KEEP
16
Floating Point
• There is no “soft-float” variant of the AARCH64 Procedure Calling
Standard
• The deprecated small vector feature of VFP is removed
• Load/store addressing modes are identical to integer load/store
• FCSEL/FCCMP equivalent to integer CSEL/CCMP instructions
– Set integer condition flags directly, not modify FPSR
• All floating-point multiply-add and multiply-sub instructions are “fused”
17
Scalar/SIMD Registers
• SIMD and Scalar share register bank
– 32 bit float registers: S0 ... S31
– 64 bit double registers: D0 ... D31
– 128 bit SIMD registers: V0 ... V31
• S0 is bottom 32 bits of D0 which is the bottom 64 bits of V0
18
System instructions
• System register access
– No access to CPSR as a single register, but with system instruction
– MRS
• Barriers
– DMB
19
Weakly ordered memory model
• With ARM MP systems, the thread using programmer will also have
to deal with weak memory model.
• Unlike on X86, but like Aarch32 and PowerPC, order of writes to
memory isn't guaranteed. Deal with it:
– use mutexes!
– barrier instructions DMB, DSB, ISB
– ARMv8: Load-Acquire/Store-Release instructions: LDRA, STRL
20
Aarch64 call convention
• Arguments and return values in registers
– X0 - X7 arguments and return value
– X8 indirect result (struct) location
– X9 - X15 temporary registers
– X16 - X17 intra-call-use registers (PLT, linker)
– X18 platform specific use (TLS)
– X19 - X28 callee-saved registers
– X29 frame pointer
– X30 link register
– SP stack pointer (XZR)
21
Reference
IHI0055B_aapcs64.pdf
Aarch64 call convention floats
• VFP/SIMD mandatory - no soft float ABI
– V0 - V7 arguments and return value
– D8 - D15 callee saved registers
– V16 - V31 temporary registers
• Bits 64:128 not saved on V8-V15
22
Reference
IHI0055B_aapcs64.pdf

More Related Content

What's hot

Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)shimosawa
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Kynetics
 
15CS44 MP & MC Module 2
15CS44 MP & MC Module  215CS44 MP & MC Module  2
15CS44 MP & MC Module 2RLJIT
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD
 
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)Linaro
 
Poll mode driver integration into dpdk
Poll mode driver integration into dpdkPoll mode driver integration into dpdk
Poll mode driver integration into dpdkVipin Varghese
 
Board support package_on_linux
Board support package_on_linuxBoard support package_on_linux
Board support package_on_linuxVandana Salve
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
Study on 32-bit Cortex - M3 Powered MCU: STM32F101
Study on 32-bit Cortex - M3 Powered MCU: STM32F101Study on 32-bit Cortex - M3 Powered MCU: STM32F101
Study on 32-bit Cortex - M3 Powered MCU: STM32F101Premier Farnell
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Deepak Shankar
 
LLVM Instruction Selection
LLVM Instruction SelectionLLVM Instruction Selection
LLVM Instruction SelectionShiva Chen
 
Arm Processors Architectures
Arm Processors ArchitecturesArm Processors Architectures
Arm Processors ArchitecturesMohammed Hilal
 
Memory Segmentation of 8086
Memory Segmentation of 8086Memory Segmentation of 8086
Memory Segmentation of 8086Nikhil Kumar
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptxJoyChowdhury30
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLinaro
 
DWARF Data Representation
DWARF Data RepresentationDWARF Data Representation
DWARF Data RepresentationWang Hsiangkai
 

What's hot (20)

Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
 
15CS44 MP & MC Module 2
15CS44 MP & MC Module  215CS44 MP & MC Module  2
15CS44 MP & MC Module 2
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
 
Poll mode driver integration into dpdk
Poll mode driver integration into dpdkPoll mode driver integration into dpdk
Poll mode driver integration into dpdk
 
Board support package_on_linux
Board support package_on_linuxBoard support package_on_linux
Board support package_on_linux
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
80286 microprocessors
80286 microprocessors80286 microprocessors
80286 microprocessors
 
Study on 32-bit Cortex - M3 Powered MCU: STM32F101
Study on 32-bit Cortex - M3 Powered MCU: STM32F101Study on 32-bit Cortex - M3 Powered MCU: STM32F101
Study on 32-bit Cortex - M3 Powered MCU: STM32F101
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
LLVM Instruction Selection
LLVM Instruction SelectionLLVM Instruction Selection
LLVM Instruction Selection
 
Arm Processors Architectures
Arm Processors ArchitecturesArm Processors Architectures
Arm Processors Architectures
 
Memory Segmentation of 8086
Memory Segmentation of 8086Memory Segmentation of 8086
Memory Segmentation of 8086
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoC
 
DWARF Data Representation
DWARF Data RepresentationDWARF Data Representation
DWARF Data Representation
 
PCIe DL_layer_3.0.1 (1)
PCIe DL_layer_3.0.1 (1)PCIe DL_layer_3.0.1 (1)
PCIe DL_layer_3.0.1 (1)
 

Viewers also liked

Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsLinaro
 
Q4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerQ4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerLinaro
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureRyo Jin
 
ARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on AndoidARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on Andoidhidenorly
 
LAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELinaro
 
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)Leon Anavi
 
2010 11 psa montreal explanation and fundamentalism
2010 11 psa montreal explanation and fundamentalism2010 11 psa montreal explanation and fundamentalism
2010 11 psa montreal explanation and fundamentalismIoan Muntean
 
20141111_SOS3_Gallo
20141111_SOS3_Gallo20141111_SOS3_Gallo
20141111_SOS3_GalloAndrea Gallo
 
HKG15-405: Redundant zero/sign-extension elimination in GCC
HKG15-405: Redundant zero/sign-extension elimination in GCCHKG15-405: Redundant zero/sign-extension elimination in GCC
HKG15-405: Redundant zero/sign-extension elimination in GCCLinaro
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64Linaro
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...Linaro
 
LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELinaro
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64Linaro
 
SFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEESFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEELinaro
 
Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Yannick Gicquel
 
BKK16-504 Running Linux in EL2 Virtualization
BKK16-504 Running Linux in EL2 VirtualizationBKK16-504 Running Linux in EL2 Virtualization
BKK16-504 Running Linux in EL2 VirtualizationLinaro
 
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEEBKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEELinaro
 
HKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARMHKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARMLinaro
 

Viewers also liked (20)

Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON Intrinsics
 
Q4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerQ4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-Vectorizer
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit Architecture
 
ARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on AndoidARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on Andoid
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
LAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEE
 
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
 
ARM-KVM: Weather Report
ARM-KVM: Weather ReportARM-KVM: Weather Report
ARM-KVM: Weather Report
 
2010 11 psa montreal explanation and fundamentalism
2010 11 psa montreal explanation and fundamentalism2010 11 psa montreal explanation and fundamentalism
2010 11 psa montreal explanation and fundamentalism
 
20141111_SOS3_Gallo
20141111_SOS3_Gallo20141111_SOS3_Gallo
20141111_SOS3_Gallo
 
HKG15-405: Redundant zero/sign-extension elimination in GCC
HKG15-405: Redundant zero/sign-extension elimination in GCCHKG15-405: Redundant zero/sign-extension elimination in GCC
HKG15-405: Redundant zero/sign-extension elimination in GCC
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
 
LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEE
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
 
SFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEESFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEE
 
Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)
 
BKK16-504 Running Linux in EL2 Virtualization
BKK16-504 Running Linux in EL2 VirtualizationBKK16-504 Running Linux in EL2 Virtualization
BKK16-504 Running Linux in EL2 Virtualization
 
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEEBKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
 
HKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARMHKG15-400: Next steps in KVM enablement on ARM
HKG15-400: Next steps in KVM enablement on ARM
 

Similar to GCC for ARMv8 Aarch64

Comparison between RISC architectures: MIPS, ARM and SPARC
Comparison between RISC architectures: MIPS, ARM and SPARCComparison between RISC architectures: MIPS, ARM and SPARC
Comparison between RISC architectures: MIPS, ARM and SPARCApurv Nerlekar
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionDr. Pankaj Zope
 
Arm cortex-m4 programmer model
Arm cortex-m4 programmer modelArm cortex-m4 programmer model
Arm cortex-m4 programmer modelMohammed Gomaa
 
Computer Organisation and Architecture
Computer Organisation and ArchitectureComputer Organisation and Architecture
Computer Organisation and ArchitectureSubhasis Dash
 
Introduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorIntroduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorDarling Jemima
 
POWER processor and features presentation
POWER processor and features presentationPOWER processor and features presentation
POWER processor and features presentationGanesan Narayanasamy
 
22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE
22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE
22cs201 COMPUTER ORGANIZATION AND ARCHITECTUREKathirvel Ayyaswamy
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberasodariyabhavesh
 
Microprocessor and Application (8085)
Microprocessor and Application (8085)Microprocessor and Application (8085)
Microprocessor and Application (8085)ufaq kk
 
Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Frank Kienle
 
Introduction to 80386 microprocessor
Introduction to 80386 microprocessorIntroduction to 80386 microprocessor
Introduction to 80386 microprocessorShehrevar Davierwala
 
Introduction to 8086 microprocessor
Introduction to 8086 microprocessorIntroduction to 8086 microprocessor
Introduction to 8086 microprocessorShreyans Pathak
 

Similar to GCC for ARMv8 Aarch64 (20)

ARM Processors
ARM ProcessorsARM Processors
ARM Processors
 
Comparison between RISC architectures: MIPS, ARM and SPARC
Comparison between RISC architectures: MIPS, ARM and SPARCComparison between RISC architectures: MIPS, ARM and SPARC
Comparison between RISC architectures: MIPS, ARM and SPARC
 
Lecture9
Lecture9Lecture9
Lecture9
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb Instruction
 
Unit ii arm7 thumb
Unit ii arm7 thumbUnit ii arm7 thumb
Unit ii arm7 thumb
 
It322 intro 1
It322 intro 1It322 intro 1
It322 intro 1
 
Microprocessor
MicroprocessorMicroprocessor
Microprocessor
 
Arm cortex-m4 programmer model
Arm cortex-m4 programmer modelArm cortex-m4 programmer model
Arm cortex-m4 programmer model
 
Computer Organisation and Architecture
Computer Organisation and ArchitectureComputer Organisation and Architecture
Computer Organisation and Architecture
 
Introduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorIntroduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM Processor
 
Arm11
Arm11Arm11
Arm11
 
POWER processor and features presentation
POWER processor and features presentationPOWER processor and features presentation
POWER processor and features presentation
 
Archi arm2
Archi arm2Archi arm2
Archi arm2
 
8051d
8051d8051d
8051d
 
22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE
22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE
22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furber
 
Microprocessor and Application (8085)
Microprocessor and Application (8085)Microprocessor and Application (8085)
Microprocessor and Application (8085)
 
Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...
 
Introduction to 80386 microprocessor
Introduction to 80386 microprocessorIntroduction to 80386 microprocessor
Introduction to 80386 microprocessor
 
Introduction to 8086 microprocessor
Introduction to 8086 microprocessorIntroduction to 8086 microprocessor
Introduction to 8086 microprocessor
 

More from Yi-Hsiu Hsu

Glow introduction
Glow introductionGlow introduction
Glow introductionYi-Hsiu Hsu
 
Yocto Project introduction
Yocto Project introductionYocto Project introduction
Yocto Project introductionYi-Hsiu Hsu
 
Understand more about C
Understand more about CUnderstand more about C
Understand more about CYi-Hsiu Hsu
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V IntroductionYi-Hsiu Hsu
 

More from Yi-Hsiu Hsu (6)

Glow introduction
Glow introductionGlow introduction
Glow introduction
 
TensorRT survey
TensorRT surveyTensorRT survey
TensorRT survey
 
Yocto Project introduction
Yocto Project introductionYocto Project introduction
Yocto Project introduction
 
Understand more about C
Understand more about CUnderstand more about C
Understand more about C
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
 
Memory model
Memory modelMemory model
Memory model
 

Recently uploaded

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 

Recently uploaded (20)

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 

GCC for ARMv8 Aarch64

  • 1. GCC for ARMv8 Aarch64 2014 issue.hsu@gmail.com
  • 2. New features • Load-acquire and store-release atomics • AdvSIMD usable for general purpose float math • Larger PC-relative addressing and branching • Literal pool access and most conditional branches are extended to ± 1MB, unconditional branches and calls to ±128MB • Non-temporal (cache skipping) load/store • Load/store of a non-contiguous pair of registers 2
  • 3. Registers • 64 Bit integer registers: – X0 ~ X29, X30/LR, SP/ZERO • Only register with special semantics is 31, which acts as both stack pointer and a zero register – Zero register • When used as a source register, and discards the result when used as destination register – Stack pointer • When used as a load/store base register • Some arithmetic instructions • X30/LR for procedure call link register is unbanked, exception save restart PC to the target exception level’s ELR system register 3
  • 4. Registers (cont) • Bottom 32 bits of the registers are referred as W0 .. W30 • Benefits – Easier to do 64-bit arithmetic! – Less need to spill to the stack – Spare registers to keep more temporaries 4
  • 5. Structure Layout 5 struct foo { int32_t a; void* p; int32_t x; }; 32-bit 64-bit 64-bit struct foo { void* p; int32_t a; int32_t x; };
  • 6. Data models • ARM targeted two data models for the 64-bit mode, to address the key OS partners – The first is LP64, where integers are 32-bit, and long integers are 64-bit, which is used by Linux, most UNIXes and OS X – The other is LLP64, where integers and long integers are 32-bit, while long long integers are 64-bit, and favored by Microsoft Windows • -mabi=name – Generate code for the specified data model. – Permissible values are ‘ilp32’ for SysV-like data model where int, long int and pointer are 32-bit, and ‘lp64’ for SysV-like data model where int is 32-bit, but long int and pointer are 64-bit. – The default depends on the specific target configuration. Note that the LP64 and ILP32 ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries. 6 Reference http://www.unix.org/version2/whatsnew/lp64_wp.html http://www.realworldtech.com/arm64/2/ http://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
  • 7. Data models (cont) 7 struct foo { int a; long l; int x; }; Reference http://www.linaro.org/assets/common/campus-party- presentation-Sept_2013.pdf
  • 8. Conditional instructions • Instructions are unconditionally executed but use the condition flags as an extra input to the instruction – Conditional branch • CBZ, B.cond – Add/subtract with carry • ADC, SBC – Conditional compare • CCMP – Conditional select/set with increment, negate or invert • Benchmarking reveals these to be the highest frequency used of single conditional instructions • CSEL, CSET 8
  • 9. Addressing features (cont) • Register indexed addressing – Allowing a 64-bit index register to be added to 64-bit base register – Providing sign or zero extension of 32-bit value within an index register • PC relative addressing – PC-relative literal loads have an offset range of ±1MB. This permits fewer literal pools, and more sharing of literal data between functions – reducing I-cache and TLB pollution – Most conditional branches have a range of ±1MiB, expected to be sufficient for the majority of conditional branches which take place within a single function – Unconditional branches, including branch and link, have a range of ±128MiB. Expected to be sufficient to span the static code segment of most executable load modules and shared objects, without needing linker-inserted trampolines or “veneers” – PC-relative load/store and address generation with a range of ±4GiB may be performed inline using only two instructions, i.e. without the need to load an offset from a literal pool 9
  • 10. An example for global variable access 10 extern int gVar; int main(void) { return gVar; } .arch armv7-a .text .align 2 .global main .type main, %function main: movw r3, #:lower16:gVar movt r3, #:upper16:gVar ldr r0, [r3, #0] bx lr .arch armv5te .text .align 2 .global main .type main, %function main: ldr r3, .L3 ldr r0, [r3, #0] bx lr .L4: .align 2 .L3: .word gVar .arch armv8-a+fp+simd .section .text.startup .align 2 .global main .type main, %function main: adrp x0, gVar ldr w0, [x0,#:lo12:gVar] ret
  • 11. Address Generation • ADRP Xd, label – Address of Page – Sign extends a 21-bit offset, shifts it left by 12 and adds it to the value of the PC with its bottom 12 bits cleared, writing the result to register Xd – This computes the base address of the 4KB aligned memory region containing label, and is designed to be used in conjunction with a load, store or ADD instruction which supplies the bottom 12 bits of the label’s address – This permits position-independent addressing of any location within ±4GB of the PC using two instructions, providing that dynamic relocation is done with a minimum granularity of 4KB – The term “page” is short-hand for the 4KB relocation granule, and is not necessarily related to the virtual memory page size 11
  • 12. Address Generation (cont) • ADR Xd, label – Address – Adds a 21-bit signed byte offset to the program counter, writing the result to register Xd – Used to compute the effective address of any location within ±1MiB of the PC 12
  • 13. The program counter (PC) • Cannot be used in arithmetic and load/store instructions • Instructions that implicitly read PC – PC relative address compute instructions • ADR, ADRP, literal load, direct branch • Its value is the address of the instruction, there is no implied offset of 4 or 8 bytes – Branch-and-link instructions • BL, BLR, will store PC to link register • Instructions to implicitly modify PC – Explicit control flow instructions • [Un]conditional branch, exception generation, exception return instructions 13
  • 14. Memory Load-Store • Bulk transfers – LDM, STM, PUSH, POP do not exist in Aarch64 – LDP, STP that load and store a pair of independent registers from consecutive memory locations, which support unaligned addresses when accessing normal memory – LDNP, STNP provide a streaming or non-temporal hint that data does not need to be retained in caches • A special exception to the normal memory ordering rules, where an address dependency exists between two memory reads and the second read was generated by a LDNP then, in the absence of any other barrier mechanism to achieve order, those memory accesses can be observed in any order by other observers within the shareability domain of the memory addresses being accessed. 14
  • 15. Memory Load-Store (cont) • Exclusive accesses – LDXR, LDXP, STXR, STXP – Exclusive access to a pair of double words permit atomic updates of a pair of pointers – Must be naturally aligned, exclusive pair access must be aligned to twice the data size • Load-acquire, store-release – LDAR, STLR, LDAXR, STLXR – Explicitly synchronizing load and store instructions (release-consistency memory model) – Reducing the need for explicit memory barriers – Require natural address alignment 15
  • 16. Memory Load-Store (cont) • Prefetch Memory – Support following addressing modes: • Base plus a scaled 12-bit unsigned immediate offset or base plus an unscaled 9-bit signed immediate offset • Base plus a 64-bit register offset. This can be optionally scaled by 8-bits, for example LSL#3. • Base plus a 32-bit extended register offset. This can be optionally scaled by 8-bits. • PC-relative literal. – PRFM <prfop>, addr | label • <prfop> is defined as <type><target><policy> • <type>: PLD (prefetch for load), PST (prefetch for store), PLI (preload instructions) • <target>: L1 (level 1 cache), L2 (level 2 cache), L3 (level 3 cache) • <policy> – KEEP: Retained or temporal prefetch, allocated in the cache normally – STRM: Streaming or non-temporal prefetch, for data that is used only once • PLDL1KEEP, PSTL2STRM, PLIL3KEEP 16
  • 17. Floating Point • There is no “soft-float” variant of the AARCH64 Procedure Calling Standard • The deprecated small vector feature of VFP is removed • Load/store addressing modes are identical to integer load/store • FCSEL/FCCMP equivalent to integer CSEL/CCMP instructions – Set integer condition flags directly, not modify FPSR • All floating-point multiply-add and multiply-sub instructions are “fused” 17
  • 18. Scalar/SIMD Registers • SIMD and Scalar share register bank – 32 bit float registers: S0 ... S31 – 64 bit double registers: D0 ... D31 – 128 bit SIMD registers: V0 ... V31 • S0 is bottom 32 bits of D0 which is the bottom 64 bits of V0 18
  • 19. System instructions • System register access – No access to CPSR as a single register, but with system instruction – MRS • Barriers – DMB 19
  • 20. Weakly ordered memory model • With ARM MP systems, the thread using programmer will also have to deal with weak memory model. • Unlike on X86, but like Aarch32 and PowerPC, order of writes to memory isn't guaranteed. Deal with it: – use mutexes! – barrier instructions DMB, DSB, ISB – ARMv8: Load-Acquire/Store-Release instructions: LDRA, STRL 20
  • 21. Aarch64 call convention • Arguments and return values in registers – X0 - X7 arguments and return value – X8 indirect result (struct) location – X9 - X15 temporary registers – X16 - X17 intra-call-use registers (PLT, linker) – X18 platform specific use (TLS) – X19 - X28 callee-saved registers – X29 frame pointer – X30 link register – SP stack pointer (XZR) 21 Reference IHI0055B_aapcs64.pdf
  • 22. Aarch64 call convention floats • VFP/SIMD mandatory - no soft float ABI – V0 - V7 arguments and return value – D8 - D15 callee saved registers – V16 - V31 temporary registers • Bits 64:128 not saved on V8-V15 22 Reference IHI0055B_aapcs64.pdf

Editor's Notes

  1. Address dependency An address dependency exists when the value returned by a read access is used to compute the address of a subsequent read or write access. The address dependency exists even if the value read by the first read access does not change the address of the second read or write access.
  2. SVC: gen exception target at EL1 HVC: gen exception target at EL2 SMC: gen exception target at EL3 DCPSn: debug change processor state to ELn CRPS:debug restore processor state WFE: wait for event WFI: wait for interrupt SEV: send event