2. Agenda Introduction to OpenJDK & Arm64
About Our Port
Background
Windows and Arm64 Nuances
Timeline
Testing and Benchmarking
Next Steps
It’s Demo Time!
4. What is The OpenJDK Project?
Open Java Development Kit
• Free and open source reference implementation of Java SE (Standard Edition)
• Licensed under GNU GPL version 2 with Classpath Exception
2006
OpenJDK Timeline
Java
HotSpot
Virtual
Machine
2007
Almost all
the Java
Development
Kit
2010
100% of the
Java
Development
Kit
5. The OpenJDK Community
Getting Involved Is Caring
2006
OpenJDK Timeline
Java
HotSpot
Virtual
Machine
2007
Almost all
the Java
Development
Kit
2010
100% of the
Java
Development
Kit
RedHat signs the
Sun/Oracle
Contributor
Agreement and
TCK License
Agreement
Porters Group created
to extend OpenJDK on
different processor
architectures and
OSes
IBM, Apple SAP join
the OpenJDK
Community
Microsoft
2013 +
6. What is Arm?
RISC-y Business?
• Reduced Instruction Set Computer
• Highly optimized instruction set
• Large number of registers
• Load/Store architecture:
Memory
Data Processing
Instructions
7. Memory Access
Instructions
What is Arm?
RISC-y Business?
• Reduced Instruction Set Computer
• Highly optimized instruction set
• Large number of registers
• Load/Store architecture:
• Memory accessed via specific instructions
• E.g. LDR Rt, <addr>
where Rt is the integer register
Processor
Register
Memory
8. What is Arm64 aka AArch64?
What's in a Name?
• 64-bit ISA
• 64-bit wide int registers,
data and pointers
• Weak Memory Model with
Multiple Copy Atomicity
• Barriers/fences are needed
for access ordering
• E.g. ISBs – instruction
flushes the CPU pipeline, etc
• E.g. One-way barriers
(LDARs, etc)
inline void OrderAccess::loadload() { acquire(); }
inline void OrderAccess::storestore() { release(); }
inline void OrderAccess::loadstore() { acquire(); }
inline void OrderAccess::storeload() { fence(); }
#define READ_MEM_BARRIER atomic_thread_fence(std::memory_order_acquire);
#define WRITE_MEM_BARRIER atomic_thread_fence(std::memory_order_release);
#define FULL_MEM_BARRIER atomic_thread_fence(std::memory_order_seq_cst);
inline void OrderAccess::acquire() {
READ_MEM_BARRIER;
}
inline void OrderAccess::release() {
WRITE_MEM_BARRIER;
}
inline void OrderAccess::fence() {
FULL_MEM_BARRIER;
}
9. Arm64 ISA and Third-Party Systems
The More, The Merrier
Arm64 ISA Timeline
Arm v8
eMAG system
Arm v8.1
ThunderX2
system
Arm v8.2
Surface Pro X
system
Arm v8.3 ….
Ampere
Computing’s
product. Also
known as Applied
Micro’s XGene3
and Skylark
Cavium Inc.’s
product. Now
owned by
Marvell
Technology
Group
Microsoft and
Qualcomm’s
collaboration –
SQ1&2 processors
…
10. About Our Port
Background
Windows & Arm64
Nuances
Timeline
11. What is an OpenJDK Port?
Write Once Run Anywhere... ?
• Windows on Arm64 is a new platform for OpenJDK
• In order to be able to develop in Java, we need to
get the JDK to work on the new platform
• In-order to run Java applications on this new
platform, we need to get the Java Runtime
Environment (JRE) to work on it.
• A JDK is a superset of a JRE as it includes tools and
utilities for development and debugging
Java
HotSpot VM
A JRE
Class
Libraries
Runtime
Execution
Engine
UI Toolkits
Base, Lang,
Util Libs
12. Quick Overview of HotSpot
What’s in a VM?
• Most of the code is non-OS and non-Arch specific
• E.g. Memory Management, Compilers,
Metadata, etc
• Rest is organized as follows:
• Architecture specific code
• E.g. AArch64, x86 …
• OS specific code
• E.g. Windows, Linux …
• OS and Architecture specific code
• E.g. Windows_AArch64, Linux_AArch64 …
HotSpot Source Repo
Share
CPU
OS
OS_CPU
13. What is a Runtime?
Bytecode to Native Code
• Classloading, Interpretation, Bytecode
verification, Exception Handling, Thread
Management, Synchronization
• Our changes: JDK Launcher for JVM creation,
destruction to use structured exception
handling on Windows
Java
HotSpot VM
Runtime
Execution
Engine
Execution
Engine
14. What is an Execution Engine?
A Smart, Adaptive, Profile Guided Orchestrator…
• Compilation and Heap Management
• Most of our changes were in the execution
engine to enable Arm64 architecture
specific changes for Windows and adding
Windows-Arm64 specific changes for
HotSpot in general
Java
HotSpot VM
Runtime
Execution
Engine
Runtime
15. Windows and AArch64 Specific Learnings
What’s in a Register?
• Register R18 is a platform register and has
special meanings for kernel and user mode
• We need to avoid using it for HotSpot
purposes and treat is as “Reserved”
• Invalidating instruction cache
R18 =
non_allocatable_reg
Flushing a range of
bytes in ICache
// Interface for updating the instruction cache. Whenever the VM modifies code, part of the processor // instruction cache potentially has to
be flushed.
class ICache : public AbstractICache {
<snip>
static void invalidate_range(address start, int nbytes) {
FlushInstructionCache((HANDLE)GetCurrentProcess(), start, (SIZE_T)(nbytes));
}
};
16. Windows and AArch64 Specific Learnings
Features, features, everywhere!
• Cache line size optimized assembly stubs for
copying
• CPU features detection such as AES, CRC32, etc
void VM_Version::get_os_cpu_info() {
if (IsProcessorFeaturePresent(PF_ARM_V8_CRC32_INSTRUCTIONS_AVAILABLE)) _features |= CPU_CRC32;
if (IsProcessorFeaturePresent(PF_ARM_V8_CRYPTO_INSTRUCTIONS_AVAILABLE)) _features |= CPU_AES | CPU_SHA1 | CPU_SHA2;
if (IsProcessorFeaturePresent(PF_ARM_VFP_32_REGISTERS_AVAILABLE)) _features |= CPU_ASIMD;
…
CPU features detection
Platform optimized byte
swaps and copy stubs
17. Windows and MSVC Specific Learnings
Intrinsics Fun
• Atomic Read-Write barriers and functions such
as CompareExchange
• MSVC offers various intrinsics and we
employed those
• E.g. __nop()
MSVC Intrinsics
#ifdef __GNUC__
// __nop needs volatile so that compiler doesn't optimize it away
#define NOP() asm volatile ("nop");
#elif defined(_MSC_VER)
// Use MSVC instrinsic: https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=vs-2019#I
#define NOP() __nop();
#endif
Platform friendly
Intrinsics for Atomic
functions
18. Windows and MSVC Specific Learnings
Would the real 64-bit please stand up!?
LP64 vs LLP64 issues:
Extending 64-bit long integers and pointers to
the long-long integers and pointers needed
by the 64-bit Windows platform
• Most Unix-like systems (Linux, macOS, etc)
follow the LP64 data model
• Windows follows the LLP64 model
LP64 vs LLP64 data
models
LP64 LLP64
short 16 16
int 32 32
long 64 32
long long 64 64
size_t, pointers 64 64
19. Meet Us and Our Port
The Trifecta
Feb 2020
Port Timeline
$ bin/java.exe –version
works!
# Java VM: OpenJDK 64-
Bit Server VM
(fastdebug 15-
internal+… windows-
aarch64)
Mar 2020
Core Dump
Apr 2020 May 2020 Jun 2020 Jul 2020 Aug 2020 Sep 2020 Oct 2020
eMAG system
20. Meet Us and Our Port
The Trifecta
Feb 2020
Port Timeline
Mar 2020 Apr 2020 May 2020 Jun 2020 Jul 2020 Aug 2020 Sep 2020 Oct 2020
Great Collaboration
with the MSVC team
eMAG system
ThunderX2 system
Surface Pro X system
JTReg tier 1 testing, JMH
subset runs and JVM2008 +
JBB2015 subset runs
21. Meet Us and Our Port
The Trifecta
Feb 2020
Port Timeline
Mar 2020 Apr 2020 May 2020 Jun 2020 Jul 2020 Aug 2020 Sep 2020 Oct 2020
eMAG system
Parts of SPEC SERT
work with C1 and
Parallel GC
Benchmark mods needed
for JNI and >64 cores
identification on Windows
C2 is functional
SPEC SERT
changes made
Full scale testing
underway
ThunderX2 system
Surface Pro X system
22. Meet Us and Our Port
The Trifecta
Feb 2020
Port Timeline
Mar 2020 Apr 2020 May 2020 Jun 2020 Jul 2020 Aug 2020 Sep 2020 Oct 2020
eMAG system
Start discussions with
RedHat on the patches
ThunderX2 system
Surface Pro X system
Fixed G1 GC – Introduced
a workaround Patches surfaced on
OpenJDK and first EA
build released
Exception handling bug
fixed for Swing and Java
2D + Surface Pro X
23. Meet Us and Our Port
The Trifecta
Feb 2020
Port Timeline
Mar 2020 Apr 2020 May 2020 Jun 2020 Jul 2020 Aug 2020 Sep 2020 Oct 2020
eMAG system
ThunderX2 system
Surface Pro X system
JEP drafted
JEP became
a Candidate
SPEC SERT
changes
approved
All three of us
became
Committers to
OpenJDK
AArch64
Project
All GCs enabled and
heap + JVM scaling
tests completed
JDK 16
targeted
We are
OpenJDK!
25. Testing Setup & CI
• We divided our changes into incremental
patch sets and made sure that they cleanly
apply to tip.
• We tested the patches incrementally on
Linux/AArch64, Windows/AArch64,
Windows/x64, and, only for functional
testing, Linux/x64.
• We enabled CI for JTReg tests on our
patches on the test platforms
Phases
Windows +
AArch64
Linux +
AArch64
Windows
+ X86-64
(VMs OK)
Linux +
X86-64
(VMs OK)
Phase 1
hotspot:tier
1, jdk:tier1
and
langtools
hotspot:tier
1, jdk:tier1
and
langtools
hotspot:tier
1, jdk:tier1
and
langtools
hotspot:tier1
, jdk:tier1
and
langtools
Phase 2 -
hotspot:all,
jdk:all and
langtools
hotspot:all,
jdk:all and
langtools
hotspot:all,
jdk:all and
langtools
Phase 3
hotspot:all,
jdk:all and
langtools
- - -
Phase 4
JTReg +
GTests +
Adopt QA
tests
JTReg +
GTests +
Adopt QA
tests
JTReg +
GTests +
Adopt QA
tests
JTReg +
GTests +
Adopt QA
tests
26. Testing Setup & CI
• We divided our changes into incremental
patch sets and made sure that they cleanly
apply to tip.
• We tested the patches incrementally on
Linux/AArch64, Windows/AArch64,
Windows/x64, and, only for functional
testing, Linux/x64.
• We enabled CI for JTReg tests on our
patches on the test platforms
Test
system
Hardware
Patch
combo
hbIR
max
hbIR
settled
max-
JOPS
critical-
JOPS
linux-
aarch64
TX2 P1 100% 98% 94% 49%
linux-
aarch64
TX2
P1 +
P2
100% 97% 94% 50%
windows
-aarch64
TX2
P1 +
P2
100% 88% 83% 40%
windows
-x86-64
Skylake P1 100% 90% 85% 27%
windows
-x86-64
Skylake
P1 +
P2
100% 90% 85% 27%
27. Our Workload Status and Benchmark Matrix
Workload Tests Arm64
Status
Comments
Java Regression Testing
Framework (JTREG)
hotspot:all, jdk:all and langtools o* We will soon include Gtests and Adopt QA tests
Java Micro Benchmark
Harness (JMH)
many microbenchmarks used for
performance and implementation
testing
o* we ran the 'jmh-jdk-microbenchmarks' suite on
our test platforms and found no significant
issues
SPEC JBB2015 o* Performance studies with newer GCs still WIP
SPEC JBB2005 o* Performance studies with newer GCs still WIP
SPEC JVM2008 crypto, xml, derby, compress, etc o* startup doesn't work on JDK8+
SPEC SERT o
Benchmark changes made to accommodate the
new platform combo, up-streamed to SPEC
DaCapo Benchmark avrora fop h2 jython luindex
lusearch lusearch-fix pmd sunflow
tomcat xalan
o*
one benchmark utilizes an x86-64 dll. A few
others don't work on JDK8+. Lower priority
VSCode o
Minecraft Server o
Legend Status
o Complete
o* Almost there, see notes
x Incomplete
29. Next Steps
The journey has just started …
• macOS + AppleSi port
• Collaboration with Azul
• Learnings from Windows port: R18, CPU feature detection, etc.
• RFE: https://github.com/openjdk/aarch64-port/pull/2
• JVMCI and AOT in HotSpot
• RFE: https://github.com/openjdk/jdk/pull/685
• Backport to JDK11 for Windows and macOS
• RFE: https://github.com/openjdk/aarch64-port/tree/jdk11-windows
Keep the port up-to-
date
Working closely with the
Windows Memory
Management team
Working closely with the
MSVC team
32. Thank You!
https://github.com/microsoft/
openjdk-aarch64/releases
Download and
Take it for a Spin
Our GitHub Repo: https://github.com/microsoft/openjdk-aarch64
Our OpenJDK PR: https://github.com/openjdk/jdk/pull/212
A few announcements: https://devblogs.microsoft.com/java/announcing-openjdk-windows-arm/
https://www.infoq.com/news/2020/08/openjdk-win10-arm/
https://www.infoq.com/news/2020/09/microsoft-windows-mac-arm/
Editor's Notes
We will provide:
1) a quick timeline of our development efforts and Microsoft’s journey into OpenJDK land (spoiler alert: we were welcomed with open arm(s) (pun intended)),
2) a few Arm64 and Windows nuances,3) our testing and benchmarking North Stars.
If we feel lucky, we will even showcase a demo on our Surface Pro X device.