SlideShare a Scribd company logo
1 of 72
Download to read offline
Under the Hood of the
Testarossa JIT Compiler
Mark Stoodley
Senior Software Developer
IBM Runtime Technologies
September 19, 2016
2
Important disclaimers
• THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.
• WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION
CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED.
• ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED
ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR
INFRASTRUCTURE DIFFERENCES.
• ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.
• IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT
PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.
• IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT
OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
• NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
– CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS
OR THEIR SUPPLIERS AND/OR LICENSORS
3
• Worked on 2 completely different
production Java JIT compilers since
2002 after compiler & architecture
graduate work at University of Toronto
• Current architect of Testarossa JIT
• Eclipse OMR open source project lead
Who am I?
4
• Created in 1998 as an IBM closed source project
– Java ME to SE to many languages/compilation scenarios
– Built by IBM compiler team in Toronto (Markham) Canada
• Best known as IBM Java JIT since IBM SDK for Java 5.0 (2005)
– Early show as debug sidecar in IBM Java 1.4.2 (2004)
– Designed in conjunction with J9 JVM technology
• Also used for other IBM compiler backends and binary translators
Testarossa: backend compiler technology
5
Testarossa technology highlights: 1998-…
• Languages:
– Production: Java ME and SE, COBOL, PL/I, Z binary emulator, binary (re)optimizer
– Prototypes: Ruby, Python, SOM++, and more…
• Some technology highlights implemented by the Java JIT :
– Cooperative suspend (1999)
– Diagnostic abilities: e.g. limit files, per method options (1999)
– Full optimization while supporting type accurate GC (1999)
– AOT (rom-able) compilation for Java (1999)
– Aggressive runtime native code patching (2000)
– Invocation and time-based compilation triggers (2000)
– Adaptive compilation (cold, warm, hot, very hot, scorching) (1999)
– JIT profiling infrastructure and optimizations (2001)
– Speculative class hierarchy based inlining and optimization (2001)
– Fairly complete set of classical compiler optimizations and dataflow analyses (2001)
– Java-specific optimizations like ”check” removal (2001)
– Java debug support (2001)
– Escape analysis and stack allocation (2001)
– Automatic lock coarsening (2002)
– Multiple code caches (2005)
– Asynchronous compilation (2006)
– Interpreter profiling (2006)
– Real-time Specification for Java (AOT and JIT) (2005)
– Dynamic AOT compilation for Java (2006)
– Hot Code Replacement support (2007)
– Compressed references (2007)
– Multiple compilation threads (2010)
– On stack replacement (2013)
– Transactional Memory (2013)
– Packed objects (2013)
– Multitenancy (2013)
– Auto SIMD (2014)
– Auto GPU (2014)
– Heuristic tuning and retuning (1999– ongoing)
• Platforms that are or have been supported :
– ME: ARM32, X86(IA32), MIPS, POWER, SH4
– 32-bit SE: ARM, POWER, X86, Z
– 64-bit SE: POWER, X86, Z
– Hard real-time (RTSJ compliant): IA32
– COBOL, PL/I, COBOL Automatic Binary Optimizer: Z
– Z binary emulator: X86, P
• Performance metrics that have been or are actively tracked :
– Latency (elapsed time)
– Throughput (operations / sec)
– Start-up time
– Ramp-up time
– CPU consumption
– Resource consumption at idle
– Compilation time
– Memory footprint
– JIT library size
– Incremental pauses
• Hardware exploitation highlights:
– Efficient CPU instruction sequences
– Managing different kinds of hardware registers
– Exploiting hardware data type support
– Cryptographic, compression acceleration
– Character conversion loop recognition and acceleration
– Atomic locking and other synchronization optimization
– Simultaneous Multi Threading
– Transactional Memory
– SIMD (Single instruction multiple data)
– GPU (Graphics processing unit)
6
On the track: performance keeps going up!
Java6 (SR16 FP4)
Java 6.1 (SR8 FP4)
Java 7 (SR9)
Java 7.1 (SR3)
Java 8 (SR1)
0
2000
4000
6000
8000
10000
12000
Java	6.0.16.4 Java	6.1.8.4 Java	7.0.9.0 Java	7.1.3.0 Java	8.0.1.0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Java	6.0.16.4 Java	6.1.8.4 Java	7.0.9.0 Java	7.1.3.0 Java	8.0.1.0
1.53X									2.00X										2.29X									2.76X	 1.35X									1.60X						 1.76X									1.96X	
Apache Spark 1.4
Databricks
1/geometric mean
Daytrader online stock
trading application
Throughput (ops/sec)
7
• J9 and Testarossa have played critical role advancing Java
performance
– Competitive, often industry-leading, performance for 11 years now
– You have benefited from competitive pressure on your JDK even if
you don’t actually use the IBM SDK for Java
• J9 and Testarossa are now being open sourced
You all benefit from it!
8
IBM SDK for Java built from open source
Open	
JDK
HotSpot
Eclipse	OMR
Open	
JDK
Open	J9
OMR
Open	
JDK
Open	J9
OMR
Proven	adaptable	technology	in	the	
open	for	rapid	innovation	and	
collaboration	across	multiple	
language	communities
Open	JDK IBM	SDK	for	Java
Java	community	open	innovation	
and	collaboration,	deep	platform	
exploitation	for	X86	&	IBM	
hardware	platforms	
(OpenPOWER,	Linux	ONE)
Ruby?
OMR
Communities	Beyond	Java
COBOL
PL/IEmulator
Python?
OMR
JS?
OMR
Swift?
OMR
…
Long	term	support,	quick	
response	for	problems,	and	
other	forms	of	IBM	customer	
specific	engagement
+
IBM
isms
9
How did we create Eclipse OMR?
10
Start from IBM J9 Java Runtime
J9 Java Execution Environment
J9	Java
Platform	Abstraction Layer
J9	Java
Garbage
Collector
J9	Java
Diagnostic	and
Monitoring	Services
Source Code Bytecode/AST
Compiler
J9	Java
Just-In-Time	Compiler
Interpreter
Java
Source
J9	Java
Bytecode
Compiler
J9	Java
Bytecode
Interpreter
11
Refactor “Java”-ness into a Glue layer that adds
language specifics to each core component
J9	Java
JIT	Compiler	Glue
J9 Java Execution Environment
OMR
Platform	Abstraction Layer
OMR
Garbage
Collector
OMR
Diagnostic	and
Monitoring	Services
Source Code Bytecode/AST
Compiler
Interpreter
Java
Source
J9	Java
Bytecode
Compiler
J9	Java
Bytecode
Interpreter
J9	Java	Diagnostic	
and	Monitoring	Glue
J9	
Java	
GC	
Glue
OMR
Just	in	Time
(JIT)	
Compiler
12
Form Eclipse OMR around core components
OMR
Platform	Abstraction Layer
OMR
Garbage
Collector
OMR
Diagnostic	and
Monitoring	Services
OMR
Just	in	Time
(JIT)	
Compiler
13
http://www.eclipse.org/omr
https://github.com/eclipse/omr
https://developer.ibm.com/open/omr/
Dual License:
Eclipse Public License V1.0
Apache 2.0
Users and contributors very welcome
https://github.com/eclipse/omr/blob/master/CONTRIBUTING.md
Eclipse OMR
Created March 2016
14
port platform abstraction (porting) library
thread cross platform pthread-like threading library
vm APIs to manage per-interpreter and per-thread contexts
gc garbage collection framework for managed heaps
compiler extensible compiler framework
jitbuilder WIP project to simplify bring up for a new JIT compiler
omrtrace library for publishing trace events for monitoring/diagnostics
omrsigcompat signal handling compatibility library
example demonstration code to show how a language runtime might
consume OMR components, also used for testing
fvtest language independent test framework built on the example glue so that
components can be tested outside of a language runtime,
uses Google Test 1.7 framework
+ a few others
~800KLOC at this point, more components coming!
OMR components
15
port platform abstraction (porting) library
thread cross platform pthread-like threading library
vm APIs to manage per-interpreter and per-thread contexts
gc garbage collection framework for managed heaps
compiler extensible compiler framework
jitbuilder WIP project to simplify bring up for a new JIT compiler
omrtrace library for publishing trace events for monitoring/diagnostics
omrsigcompat signal handling compatibility library
example demonstration code to show how a language runtime might
consume OMR components, also used for testing
fvtest language independent test framework built on the example glue so that
components can be tested outside of a language runtime,
uses Google Test 1.7 framework
+ a few others
~800KLOC at this point, more components coming!
OMR components
IBM Contributed
500KLOC of Testarossa
September 17, 2016
16
• TR JIT design principles
• How compilation works
• AOT compilation
• Wrap-up
Rest of the talk is on Testarossa JIT
17
Be transparent
Users shouldn’t be aware of the JIT
(except that the application runs a lot faster!)
JIT design principle #1
18
Let the interpreter handle the hard stuff
Optimize to target the top 75% ish of cases
with a “simple” solution
JIT design principle #2
19
Pay attention to the costs
Overheads can very easily trump benefits
Profile data occupies space
Consider what will happen at scale (10K+ classes)
JIT design principle #3
20
Use the right optimization tool for the job
Prove when you can prove easily
Guard when you can’t prove or can’t prove easily
Speculate appropriately for the bias
JIT design principle #4
21
Compilers can do amazing things
Remember the “unreadable” list of highlight technologies from slide 5!
Many items on that list did not exist or had never been done in a
production runtime system before Java
Also keep in mind
22
Compilers are not all powerful
Can’t change algorithms
Engineering constraints can take away a lot of options
Also keep in mind
23
“JIT as optimizer for interpreter”
is reasonable starting point
But it’s not how either production Java runtime compiler evolved
IMO interpreter should focus on getting it right without being really slow
JIT compiler should make it fast but stay as simple as possible
Also keep in mind
24
So how does it work?
25
• Methods almost always start out running in interpreter
– Interpreter simulates the Java Virtual Machine
– Uses a ”program counter” (pc) to point at the current bytecode
– Conceptually just a loop loading and simulating bytecode at *pc
do {
switch (*pc) {
…
case BCdup :
t=pop();push(t); push(t); pc++; break;
…
}
} while (!finishedProgram());
J9 JVM: methods start off interpreted
26
• Remember: the interpreter has to handle all the hard stuff!
• It is a switch loop
– But uses computed goto’s
– Deal with exceptions
– Deal with all the various things that can go wrong
– Does some profiling
– Counts method invocations to trigger JIT compilations
– …
• More info in Dan Heidinga’s talk tomorrow on the J9 interpreter
Tuesday @ 12:30 in Continental Ballroom 1/2/3
OK, it’s more complicated than that
27
Interpreter helps JIT compiler do a good job
Thread
Bytecode
Interpreter
VM State
Native State
Java
Stack
pc
Method Bytecodes
…
15: ificmpne 29
…
23: instanceof
…
29: invokev <C.foo>
…
sp
J9 JVM
28
Interpreter collects profiles
Thread
Bytecode
Interpreter
VM State
Native State
Java
Stack
pc
Method Bytecodes
…
15: ificmpne 29
…
23: instanceof
…
29: invokev <C.foo>
…
sp
Thread Profile Buffer
- Branch directions
- Actual classes
- Invocation targets
Per thread buffer: no mutex!
Buffer is an event trace
method,bytecode locator
data (e.g. receiver class)
Very easy to store and bump
cursor into the buffer
J9 JVM
29
Threads collect into buffer until full
Thread 1
Profile
Buffer A
Thread 2
Profile
Buffer B
Thread 3
Profile
Buffer C
Thread 4
Profile
Buffer D
J9 JVM
30
When buffer fills, put onto a queue
Profile
Buffer
Queue
A
Thread 1
Profile
Buffer E
Thread 2
Profile
Buffer B
Thread 3
Profile
Buffer C
Thread 4
Profile
Buffer D
J9 JVM
Only one queue, so needs a mutex
But only held when buffers fill and only to
enqueue/dequeue
Impact tunable with buffer size
Trade-off: lag for profile data, footprint
31
Enqueue, allocate new buffer, keep going
Profile
Buffer
Queue
J9 JVM
A
C
Thread 1
Profile
Buffer E
Thread 2
Profile
Buffer B
Thread 3
Profile
Buffer F
Thread 4
Profile
Buffer D
Queue decouples profile collection from profile
aggregation
Pool of empty buffers reduces allocation stress
32
Another thread processes buffers
Profile
Buffer
Queue
C
Buffer
Processing
Thread
Aggregated
Profile
Data Structure
Thread 1
Profile
Buffer G
Thread 2
Profile
Buffer B
Thread 3
Profile
Buffer F
Thread 4
Profile
Buffer D
J9 JVM
A
E
Iterate through trace, adding
entries one by one to profile
33
JIT threads read&write aggregated profile
Profile
Buffer
Queue
Buffer
Processing
Thread
Aggregated
Profile
Data Structure
JIT
Thread
1
JIT
Thread
N
Thread 1
Profile
Buffer G
Thread 2
Profile
Buffer B
Thread 3
Profile
Buffer F
Thread 4
Profile
Buffer D
J9 JVM
E
C
Aggregated profile also
requires a mutex!
34
1. Invocation count while interpreted used for initial compilation
• When a method’s count reaches zero, trigger method compile
2. Sampling thread
• Periodically (10ms or so) ask active threads to sample themselves
• If a method catches enough samples over time: trigger method recompile
• Samples in interpreted methods dramatically reduce invocation count
How do those JIT threads get work?
35
• “trigger” just means to enqueue a method on compilation queue
– Based on current conditions, select an optimization plan
– May already be queued, may be queued with different plan
• Testarossa compilations are (mostly) asynchronous
– Application thread continues running after enqueing the method
• Testarossa can employ multiple compilation threads
– Dynamically resized pool based on compilation load, # cores,
configuration (e.g. how important is memory vs. ramp-up speed?)
Triggering a compilation
36
• Compiler thread dramatically oversimplified algorithm:
while (!done) {
method = getNextMethodFromQueue();
if (sharedClassesCache->hasAOTCompiledMethod(method))
… = loadAotCompiledMethod(method);
else
… = compile(method); // may store AOT code to cache
commitCompiledMethod( … );
}
• You have questions, I know…
What does a compilation thread do?
37
• Compiler thread dramatically oversimplified algorithm:
while (!done) {
method = getNextMethodFromQueue();
if (sharedClassesCache->hasAOTCompiledMethod(method))
… = loadAotCompiledMethod(method);
else
… = compile(method); // may store AOT code to cache
commitCompiledMethod( … );
}
• You have questions, I know…
– Let’s start by explaining the compiler itself
The real work: the compiler thread
38
ARM
Testarossa Compilation Process
Optimizer
Analyses	and	Optimizations
cold warm hot FSDscorching AOT
IL	Generation
x86
POWER
Z
Code	Generators
Runtime
Environment/
Configuration
•Options
•Object	Model
•Memory
•Threading
•Tracing
codeMetadataRuntimeRT Helpers
very hot profiling
Profile
Manager
Hardware
counters
Sampling
Thread
Interpreter
Profile Info
JIT
Profile Info
Profiler
39
Convert the method’s bytecodes to
Testarossa’s Intermediate Language (IL)
Have slides but not enough time L
Come talk to me if you’re interested!
First step: IL Generation
40
• IL generator focuses on correctness
• Strive to avoid complexity for performance
– *striving* not always successful
• Rely on the optimizer to make it fast
Second Step: Make the IL Better
41
• About 70 basic optimizations
• Three high level categories:
1. Traditional compiler optimizations requiring little adaptation for Java
e.g. reaching definitions, block ordering, expression simplification, …
2. Traditional compiler optimizations with Java adaptation
e.g. inlining, partial redundancy elimination, loop versioning, auto
parallelization (SIMD, GPU), …
3. Optimizations developed for Java
e.g. escape analysis, monitor coarsening, async check insertion, …
Testarossa Optimizations
42
• Strategy is just a sequence of individual optimizations
– Contain groups which can be repeated or looped
– Opts can be conditional on earlier opts finding/creating opportunities
• 6 strategies with increasing compilation cost & expected payback
1. NoOpt not used by default
2. Cold initial compile during startup
3. Warm initial compile after startup or upgrade
4. Hot methods consuming > ~1% of CPU
5. Very Hot with Profiling collect profile before a scorching compile
6. Scorching methods consuming > ~12.5% of CPU
Optimization Strategies
43
• Testarossa has 4 main code generators:
– X86 (32- and 64-bit)
– POWER (32- and 64-bit, BE and LE)
– Z (IBM mainframe) (31-bit and 64-bit)
– ARM 32-bit
• Responsible for converting Testarossa IL into native instructions
– Generate fast instruction sequences for current processor
– Efficient assignment of registers
– Layout of native stack frame
– Other very detailed things based on intricate workings of processors
Third step: code generation
44
Such a simple idea:
Store JIT compiled code then
“Just” load into another JVM
AOT compilation for Java
45
Compiled code is for method, and
Methods come from classes…
But it’s not so simple
46
But what is a ”class”?
C
B
A I1
I3
I2 A implements I1, I2 { … }
B extends A { … }
C extends B implements I3 { … }
47
Inside a JVM
C
B
A I1
I3
I2
Compiler and applications work
on objects of resolved classes
e.g. C objects:
embed a B
which embeds an A
And C implements I3 and I1, I2
class A
class B
class C
48
Outside a JVM: sea of class files
C extends a class called
“B” and implements an
interface called “I3”
B extends a class called
“A”
A implements interfaces
called “I1” and “I2”
I1
I3
I2
src/directory1/
A.class
I1.class
I2.class
src/directory2/
A.class
I1.class
I2.class
src/directory3/
B.class
C.class
src/directory4/
C.class
I3.class
49
• Class files can change
• Classpath can change
• Class files can be added or removed
”Class” identity a very complicated notion
50
• Class files can change
• Classpath can change
• Class files can be added or removed
• Class loader object used to load the class can change
– Ever heard of an application class loader object outside of a JVM?
– Class loader objects (like other objects) don’t exist outside the JVM
– Serialization doesn’t help: what to deserialize to replace what object?
• Two class loaders can even load the exact same class files to
create two unique classes in a single JVM
• All perfectly valid scenarios under the JVM specification
And it even gets worse (!)
51
Seems grim, what can we do?
52
• We did it this way for a long time (embedded space and for WebSphere Real Time)
– AOT code stored alongside binary loadable version of class files called JXEs (kind of like a jar file)
• Class references aren’t the only problem though
– Compiled code also directly references addresses in the JVM
– e.g. Pointers to constant pools, pointers to ”ROM” parts of classes (see Dan Heidinga’s talk!)
– e.g. Pointers to helper functions in JIT runtime
• Code generator also builds relocation records alongside the code
– e.g. at code offset 0x208 is the address of the compiled method’s class’s constant pool
– e.g. at code offset 0x4C3 is the 4 byte relative address of JIT helper jitNewObject()
• At class load time, process relocations to bind code into current JVM process
First cut: treat everything as unresolved
53
• Our shared classes cache (SCC) debuted in Java 5.0
– Shared memory region mapped into every JVM process
– Accelerates start-up by speeding up class loading
– By itself, accelerated app server start-up by 20-30%
• Also created an opportunity to use AOT code “dynamically”
– SCC handles part of problem: “is this the same class I had before”
– So: AOT compile in first JVM run, store into SCC, load in other JVMs
• For Java 6, we revamped our AOT compilation story
– Made some improvements in code quality
– Provide another roughly 20% start-up improvement
Next goal: use AOT to accelerate startup
54
Simplified class loading, no shared cache
C
ROMClassC.class
JVM Process A
class B { … }; class C extends B { … };
B
ROMClassB.class
B
RAMClass
C
RAMClass
55
Simplified class loading, no shared cache
C
ROMClassC.class
JVM Process A
class B { … }; class C extends B { … };
B
ROMClassB.class
B
RAMClass
C
RAMClass
C
ROMClass
JVM Process BB
ROMClass
B
RAMClass
C
RAMClass
56
Simplified class loading with shared cache
C.class
JVM Process A
class B { … }; class C extends B { … };
B.class
B
RAMClass
C
RAMClass
Shared Cache
C ROMClass
B ROMClass
57
Simplified class loading with shared cache
C.class
JVM Process A
class B { … }; class C extends B { … };
B.class
B
RAMClass
C
RAMClass
JVM Process BB
RAMClass
C
RAMClass
Shared Cache
Shared Cache
C ROMClass
B ROMClass
C ROMClass
B ROMClass
Memory mapped
58
How did we make AOT better
with the shared class cache?
59
• Start-up scenario: usually running the same code over and over
– Anything you learn in first run *probably* applies in second run too
• Some optimizations are clearly ok for AOT:
– e.g. Block ordering uses block frequencies to rearrange code nicely
– Different profile in second run? Ok, it runs a bit more slowly
– But usually, the profile is incredibly similar
• Can also rely on some tricks:
– Any information local to this method or this class (fields, methods)
– Shared cache gave us a way to identify and check other methods
Dynamic AOT to accelerate start-up
60
• Some direct calls can just be inlined
– Direct call to, say, this class’s constructor
• Inline more direct calls using virtual guard infrastructure
– AOT compile optimistically generates guard as a NOP
– AOT load evaluates the guard at AOT load time (via relocation record)
– Turn NOP into a jump to an unresolved call if relocation record fails
• Shared classes cache helps to inline virtual calls from “this”
– Can reason about the vtable of the class of the compiled method
Inlining for AOT methods
61
Using the vtable for virtual “this” calls
Class
C J9Method
ROMMethod
B.foo()
class B { public void foo() {…} } class C extends B { void bar() { this.foo(); } }
Resolved
“B.foo()”
Foo() from
B.class
Resolved
C vtable
JVM Process 1
62
No SCC: are B.foo and B’.foo same? No idea!
Class
C J9Method
ROMMethod
B.foo()
Resolved
“B.foo()”
Foo() from
B.class
Resolved
C vtable
JVM Process 1
Class
C’ J9Method
ROMMethod
B’.foo()
Resolved
“B’.foo()”
Foo() from
B’.class
Resolved
C’ vtable
JVM Process 2
class B { public void foo() {…} } class C extends B { void bar() { this.foo(); } }
63
SCC : B.foo, B’.foo same? Can answer!
Class
C J9Method
ROMMethod
B.foo()
Resolved
“B.foo()”
Foo() from
B.class
Resolved
C vtable
Class
C’ J9Method
B’.foo()
Resolved
“B’.foo()”
Resolved
C’ vtable
JVM Process 1
JVM Process 2
ROMMethod
Foo() from
B.class
SCC
SCC
Same
Offset!
class B { public void foo() {…} } class C extends B { void bar() { this.foo(); } }
64
• ROMMethod includes the bytecodes
– If class’s vtable has a J9Method with the right ROMMethod, then the
right bytecodes will be inlined
– Still need to be careful about other code aspects e.g. field offsets
– But you know you got the same method implementation
• Just like the JIT:
– Need to check to make sure there isn’t another possible target
– Need to register runtime assumptions against future class loads
• Still wrap the inlined code in a guard resolved at AOT load time
– If not the right or only target: back off to a virtual invocation
Only needs to be same “enough”
65
• Profile guard: C.method profiled as most common target
if (o.clazz == <common receiver class C address>)
{ /* inlined C.method() */ }
else
o.method();
• C needs to be a resolved class
• Typically used for interface invokes
– Not as straight-forward as vtable
But we needed something stronger
66
• List of super classes and implemented interfaces for a class
– Every one must have a ROMClass in the shared cache
– AOT compiles record “validation relocation” for every referenced
resolved class (offset of a class chain in the SCC)
– AOT loads walk class chains in parallel with resolved classes in
current JVM
– Anything not right: bail and requeue method as JIT compile
• Still one challenge though:
– How to look up the resolved class pointer for “some class” ?
– Need a class loader to do that!
We implemented “class chains”
67
How can you find
a class loader object in this JVM
that corresponds to
the “same” class loader object from another JVM?
Exercise for the audience
68
How can you find
a class loader object in this JVM
that corresponds to
the “same” class loader object from another JVM?
I don’t have time today to tell you how we did it
L
Come talk to me if you’re really interested!
Exercise for the audience
69
• Modularity work in JDK9 opening up interesting opportunities
• Possibility to AOT compile entire modules
• Sounds awesome but not a straight-forward win:
– Typically don’t know much about execution profile at load time
– AOT code is generally much larger than bytecodes (10X footprint)
– Generality/flexibility of JDK libraries could hurt us if not careful
• Locales, etc. not used in all runs but maybe in some run
• Some interesting new possible optimization opportunities
– But remember the JIT design principles!
Where do we go with AOT?
70
• IBM Runtimes are going open source
– 800KLOC already contributed to Eclipse OMR project for all runtimes
– Working on the remainder in and around Java 9 development
– You’re welcome to join us at Eclipse OMR and, later, Open J9 !
– Any feedback welcome!
• Testarossa is a high performance, modular compiler technology
– 500KLOC now open sourced at Eclipse OMR
– Provides steady and significant performance uplift (through effort!)
– Around 70 optimizations with code generators for 4 hardware platforms
– Deep dove into Testarossa’s AOT compilation technology
Wrap Up
71
• Mark Stoodley mstoodle@ca.ibm.com @mstoodle
• Eclipse OMR www.eclipse.org/omr www.github.com/eclipse/omr
• Other J9 developer talks at Java One
– Dan Heidinga on Tuesday at 2:30 in Continental Ballroom 1/2/3
– Charlie Gracie on Wednesday at 10am in Golden Gate 2/3
• Visit me and other J9 devs at the IBM Booth
– I’ll be there tomorrow morning at 9:30am
• I will also be at the Eclipse booth Tuesday at about 4pm - 5:30pm
Thank You!
72
Legal Notice
IBM and the IBM logo are trademarks or registered trademarks of IBM Corporation, in the United
States, other countries or both.
Java and all Java-based marks, among others, are trademarks or registered trademarks of Oracle in
the United States, other countries or both.
Other company, product and service names may be trademarks or service marks of others.
THE INFORMATION DISCUSSED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL
PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND
ACCURACY OF THE INFORMATION, IT IS PROVIDED "AS IS" WITHOUT WARRANTY OF
ANY KIND, EXPRESS OR IMPLIED, AND IBM SHALL NOT BE RESPONSIBLE FOR ANY
DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, SUCH
INFORMATION. ANY INFORMATION CONCERNING IBM'S PRODUCT PLANS OR STRATEGY
IS SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

More Related Content

What's hot

Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VRISC-V International
 
Anatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business SuiteAnatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business Suitevasuballa
 
Las16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLas16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLinaro
 
LCA14: LCA14-502: The way to a generic TrustZone® solution
LCA14: LCA14-502: The way to a generic TrustZone® solutionLCA14: LCA14-502: The way to a generic TrustZone® solution
LCA14: LCA14-502: The way to a generic TrustZone® solutionLinaro
 
The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...
The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...
The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...NRB
 
10 Problems with your RMAN backup script
10 Problems with your RMAN backup script10 Problems with your RMAN backup script
10 Problems with your RMAN backup scriptYury Velikanov
 
IBM Introduction to New Mainframe_ z-OS Basics - Chap. 16 - Topics in z-OS Sy...
IBM Introduction to New Mainframe_ z-OS Basics - Chap. 16 - Topics in z-OS Sy...IBM Introduction to New Mainframe_ z-OS Basics - Chap. 16 - Topics in z-OS Sy...
IBM Introduction to New Mainframe_ z-OS Basics - Chap. 16 - Topics in z-OS Sy...NicholasVanHaiVu
 
The CIO's Guide to Digital Transformation
The CIO's Guide to Digital TransformationThe CIO's Guide to Digital Transformation
The CIO's Guide to Digital TransformationMuleSoft
 
Automotive embedded systems part2 v1
Automotive embedded systems part2 v1Automotive embedded systems part2 v1
Automotive embedded systems part2 v1Keroles karam khalil
 
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareHKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareLinaro
 
Java tricks for high-load server programming
Java tricks for high-load server programmingJava tricks for high-load server programming
Java tricks for high-load server programmingAndrei Pangin
 
Upgrade to IBM z/OS V2.5 Planning
Upgrade to IBM z/OS V2.5 PlanningUpgrade to IBM z/OS V2.5 Planning
Upgrade to IBM z/OS V2.5 PlanningMarna Walle
 
How to downscope your EBS upgrade project
How to downscope your EBS upgrade projectHow to downscope your EBS upgrade project
How to downscope your EBS upgrade projectpanayaofficial
 
How to Deploy & Integrate Oracle EPM Cloud Profitability and Cost Management ...
How to Deploy & Integrate Oracle EPM Cloud Profitability and Cost Management ...How to Deploy & Integrate Oracle EPM Cloud Profitability and Cost Management ...
How to Deploy & Integrate Oracle EPM Cloud Profitability and Cost Management ...Alithya
 
[SiriusCon 2020] Realization of Model-Based Safety Analysis and Integration w...
[SiriusCon 2020] Realization of Model-Based Safety Analysis and Integration w...[SiriusCon 2020] Realization of Model-Based Safety Analysis and Integration w...
[SiriusCon 2020] Realization of Model-Based Safety Analysis and Integration w...Obeo
 
MuleSoft Online Meetup - Salesforce Streaming APIs
MuleSoft Online Meetup - Salesforce Streaming APIsMuleSoft Online Meetup - Salesforce Streaming APIs
MuleSoft Online Meetup - Salesforce Streaming APIsRoyston Lobo
 
Yocto project and open embedded training
Yocto project and open embedded trainingYocto project and open embedded training
Yocto project and open embedded trainingH Ming
 
HKG18-402 - Build secure key management services in OP-TEE
HKG18-402 - Build secure key management services in OP-TEEHKG18-402 - Build secure key management services in OP-TEE
HKG18-402 - Build secure key management services in OP-TEELinaro
 

What's hot (20)

Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Anatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business SuiteAnatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business Suite
 
Las16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLas16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need it
 
LCA14: LCA14-502: The way to a generic TrustZone® solution
LCA14: LCA14-502: The way to a generic TrustZone® solutionLCA14: LCA14-502: The way to a generic TrustZone® solution
LCA14: LCA14-502: The way to a generic TrustZone® solution
 
The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...
The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...
The NRB Group mainframe day 2021 - IBM Z-Strategy & Roadmap - Adam John Sturg...
 
10 Problems with your RMAN backup script
10 Problems with your RMAN backup script10 Problems with your RMAN backup script
10 Problems with your RMAN backup script
 
IBM Introduction to New Mainframe_ z-OS Basics - Chap. 16 - Topics in z-OS Sy...
IBM Introduction to New Mainframe_ z-OS Basics - Chap. 16 - Topics in z-OS Sy...IBM Introduction to New Mainframe_ z-OS Basics - Chap. 16 - Topics in z-OS Sy...
IBM Introduction to New Mainframe_ z-OS Basics - Chap. 16 - Topics in z-OS Sy...
 
The CIO's Guide to Digital Transformation
The CIO's Guide to Digital TransformationThe CIO's Guide to Digital Transformation
The CIO's Guide to Digital Transformation
 
Automotive embedded systems part2 v1
Automotive embedded systems part2 v1Automotive embedded systems part2 v1
Automotive embedded systems part2 v1
 
EMBEDDED C
EMBEDDED CEMBEDDED C
EMBEDDED C
 
Mvs commands
Mvs commandsMvs commands
Mvs commands
 
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareHKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
 
Java tricks for high-load server programming
Java tricks for high-load server programmingJava tricks for high-load server programming
Java tricks for high-load server programming
 
Upgrade to IBM z/OS V2.5 Planning
Upgrade to IBM z/OS V2.5 PlanningUpgrade to IBM z/OS V2.5 Planning
Upgrade to IBM z/OS V2.5 Planning
 
How to downscope your EBS upgrade project
How to downscope your EBS upgrade projectHow to downscope your EBS upgrade project
How to downscope your EBS upgrade project
 
How to Deploy & Integrate Oracle EPM Cloud Profitability and Cost Management ...
How to Deploy & Integrate Oracle EPM Cloud Profitability and Cost Management ...How to Deploy & Integrate Oracle EPM Cloud Profitability and Cost Management ...
How to Deploy & Integrate Oracle EPM Cloud Profitability and Cost Management ...
 
[SiriusCon 2020] Realization of Model-Based Safety Analysis and Integration w...
[SiriusCon 2020] Realization of Model-Based Safety Analysis and Integration w...[SiriusCon 2020] Realization of Model-Based Safety Analysis and Integration w...
[SiriusCon 2020] Realization of Model-Based Safety Analysis and Integration w...
 
MuleSoft Online Meetup - Salesforce Streaming APIs
MuleSoft Online Meetup - Salesforce Streaming APIsMuleSoft Online Meetup - Salesforce Streaming APIs
MuleSoft Online Meetup - Salesforce Streaming APIs
 
Yocto project and open embedded training
Yocto project and open embedded trainingYocto project and open embedded training
Yocto project and open embedded training
 
HKG18-402 - Build secure key management services in OP-TEE
HKG18-402 - Build secure key management services in OP-TEEHKG18-402 - Build secure key management services in OP-TEE
HKG18-402 - Build secure key management services in OP-TEE
 

Similar to Under the Hood of the Testarossa JIT Compiler

FOSDEM 2017 - Open J9 The Next Free Java VM
FOSDEM 2017 - Open J9 The Next Free Java VMFOSDEM 2017 - Open J9 The Next Free Java VM
FOSDEM 2017 - Open J9 The Next Free Java VMCharlie Gracie
 
J9: Under the hood of the next open source JVM
J9: Under the hood of the next open source JVMJ9: Under the hood of the next open source JVM
J9: Under the hood of the next open source JVMDanHeidinga
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Intel® Software
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18Xiaoli Liang
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkAdamRobertsIBM
 
SemeruRuntimesUnderTheCover .pptx
SemeruRuntimesUnderTheCover .pptxSemeruRuntimesUnderTheCover .pptx
SemeruRuntimesUnderTheCover .pptxSumanMitra22
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016Tim Ellison
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.J On The Beach
 
Unisanta - Visão Geral de hardware Servidor IBM System z
Unisanta - Visão Geral de hardware Servidor IBM System zUnisanta - Visão Geral de hardware Servidor IBM System z
Unisanta - Visão Geral de hardware Servidor IBM System zAnderson Bassani
 
zEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware OverviewzEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware OverviewFelipe Lanzillotta
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance ObservationsAdam Roberts
 
Relative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleRelative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleJoao Galdino Mello de Souza
 
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...Joao Galdino Mello de Souza
 
Getting Started with JDK Mission Control
Getting Started with JDK Mission ControlGetting Started with JDK Mission Control
Getting Started with JDK Mission ControlMarcus Hirt
 
Java on z overview 20161107
Java on z overview 20161107Java on z overview 20161107
Java on z overview 20161107Marcel Mitran
 
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...zOSCommserver
 
OpenStack and z/VM – What is it and how do I get it?
OpenStack and z/VM – What is it and how do I get it?OpenStack and z/VM – What is it and how do I get it?
OpenStack and z/VM – What is it and how do I get it?Anderson Bassani
 

Similar to Under the Hood of the Testarossa JIT Compiler (20)

FOSDEM 2017 - Open J9 The Next Free Java VM
FOSDEM 2017 - Open J9 The Next Free Java VMFOSDEM 2017 - Open J9 The Next Free Java VM
FOSDEM 2017 - Open J9 The Next Free Java VM
 
J9: Under the hood of the next open source JVM
J9: Under the hood of the next open source JVMJ9: Under the hood of the next open source JVM
J9: Under the hood of the next open source JVM
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache Spark
 
SemeruRuntimesUnderTheCover .pptx
SemeruRuntimesUnderTheCover .pptxSemeruRuntimesUnderTheCover .pptx
SemeruRuntimesUnderTheCover .pptx
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
 
Unisanta - Visão Geral de hardware Servidor IBM System z
Unisanta - Visão Geral de hardware Servidor IBM System zUnisanta - Visão Geral de hardware Servidor IBM System z
Unisanta - Visão Geral de hardware Servidor IBM System z
 
Open j9 jdk on RISC-V
Open j9 jdk on RISC-VOpen j9 jdk on RISC-V
Open j9 jdk on RISC-V
 
zEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware OverviewzEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware Overview
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
 
Relative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleRelative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph Temple
 
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...
 
Getting Started with JDK Mission Control
Getting Started with JDK Mission ControlGetting Started with JDK Mission Control
Getting Started with JDK Mission Control
 
Java on z overview 20161107
Java on z overview 20161107Java on z overview 20161107
Java on z overview 20161107
 
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...
 
OpenStack and z/VM – What is it and how do I get it?
OpenStack and z/VM – What is it and how do I get it?OpenStack and z/VM – What is it and how do I get it?
OpenStack and z/VM – What is it and how do I get it?
 

More from Mark Stoodley

Eliminating the Pauses in your Java Application
Eliminating the Pauses in your Java ApplicationEliminating the Pauses in your Java Application
Eliminating the Pauses in your Java ApplicationMark Stoodley
 
Oh the compilers you'll build
Oh the compilers you'll buildOh the compilers you'll build
Oh the compilers you'll buildMark Stoodley
 
Turbo2018 workshop JIT as a Service
Turbo2018 workshop   JIT as a ServiceTurbo2018 workshop   JIT as a Service
Turbo2018 workshop JIT as a ServiceMark Stoodley
 
Jit builder status and directions 2018 03-28
Jit builder status and directions 2018 03-28Jit builder status and directions 2018 03-28
Jit builder status and directions 2018 03-28Mark Stoodley
 
JavaOne 2017 - Mark Stoodley - Open Sourcing IBM J9 JVM
JavaOne 2017 - Mark Stoodley - Open Sourcing IBM J9 JVMJavaOne 2017 - Mark Stoodley - Open Sourcing IBM J9 JVM
JavaOne 2017 - Mark Stoodley - Open Sourcing IBM J9 JVMMark Stoodley
 
VMIL keynote : Lessons from a production JVM runtime developer
VMIL keynote : Lessons from a production JVM runtime developerVMIL keynote : Lessons from a production JVM runtime developer
VMIL keynote : Lessons from a production JVM runtime developerMark Stoodley
 
Eclipse OMR: a modern toolkit for building language runtimes
Eclipse OMR: a modern toolkit for building language runtimesEclipse OMR: a modern toolkit for building language runtimes
Eclipse OMR: a modern toolkit for building language runtimesMark Stoodley
 

More from Mark Stoodley (7)

Eliminating the Pauses in your Java Application
Eliminating the Pauses in your Java ApplicationEliminating the Pauses in your Java Application
Eliminating the Pauses in your Java Application
 
Oh the compilers you'll build
Oh the compilers you'll buildOh the compilers you'll build
Oh the compilers you'll build
 
Turbo2018 workshop JIT as a Service
Turbo2018 workshop   JIT as a ServiceTurbo2018 workshop   JIT as a Service
Turbo2018 workshop JIT as a Service
 
Jit builder status and directions 2018 03-28
Jit builder status and directions 2018 03-28Jit builder status and directions 2018 03-28
Jit builder status and directions 2018 03-28
 
JavaOne 2017 - Mark Stoodley - Open Sourcing IBM J9 JVM
JavaOne 2017 - Mark Stoodley - Open Sourcing IBM J9 JVMJavaOne 2017 - Mark Stoodley - Open Sourcing IBM J9 JVM
JavaOne 2017 - Mark Stoodley - Open Sourcing IBM J9 JVM
 
VMIL keynote : Lessons from a production JVM runtime developer
VMIL keynote : Lessons from a production JVM runtime developerVMIL keynote : Lessons from a production JVM runtime developer
VMIL keynote : Lessons from a production JVM runtime developer
 
Eclipse OMR: a modern toolkit for building language runtimes
Eclipse OMR: a modern toolkit for building language runtimesEclipse OMR: a modern toolkit for building language runtimes
Eclipse OMR: a modern toolkit for building language runtimes
 

Recently uploaded

Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 

Recently uploaded (20)

Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 

Under the Hood of the Testarossa JIT Compiler

  • 1. Under the Hood of the Testarossa JIT Compiler Mark Stoodley Senior Software Developer IBM Runtime Technologies September 19, 2016
  • 2. 2 Important disclaimers • THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. • WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. • ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES. • ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE. • IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE. • IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. • NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: – CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS
  • 3. 3 • Worked on 2 completely different production Java JIT compilers since 2002 after compiler & architecture graduate work at University of Toronto • Current architect of Testarossa JIT • Eclipse OMR open source project lead Who am I?
  • 4. 4 • Created in 1998 as an IBM closed source project – Java ME to SE to many languages/compilation scenarios – Built by IBM compiler team in Toronto (Markham) Canada • Best known as IBM Java JIT since IBM SDK for Java 5.0 (2005) – Early show as debug sidecar in IBM Java 1.4.2 (2004) – Designed in conjunction with J9 JVM technology • Also used for other IBM compiler backends and binary translators Testarossa: backend compiler technology
  • 5. 5 Testarossa technology highlights: 1998-… • Languages: – Production: Java ME and SE, COBOL, PL/I, Z binary emulator, binary (re)optimizer – Prototypes: Ruby, Python, SOM++, and more… • Some technology highlights implemented by the Java JIT : – Cooperative suspend (1999) – Diagnostic abilities: e.g. limit files, per method options (1999) – Full optimization while supporting type accurate GC (1999) – AOT (rom-able) compilation for Java (1999) – Aggressive runtime native code patching (2000) – Invocation and time-based compilation triggers (2000) – Adaptive compilation (cold, warm, hot, very hot, scorching) (1999) – JIT profiling infrastructure and optimizations (2001) – Speculative class hierarchy based inlining and optimization (2001) – Fairly complete set of classical compiler optimizations and dataflow analyses (2001) – Java-specific optimizations like ”check” removal (2001) – Java debug support (2001) – Escape analysis and stack allocation (2001) – Automatic lock coarsening (2002) – Multiple code caches (2005) – Asynchronous compilation (2006) – Interpreter profiling (2006) – Real-time Specification for Java (AOT and JIT) (2005) – Dynamic AOT compilation for Java (2006) – Hot Code Replacement support (2007) – Compressed references (2007) – Multiple compilation threads (2010) – On stack replacement (2013) – Transactional Memory (2013) – Packed objects (2013) – Multitenancy (2013) – Auto SIMD (2014) – Auto GPU (2014) – Heuristic tuning and retuning (1999– ongoing) • Platforms that are or have been supported : – ME: ARM32, X86(IA32), MIPS, POWER, SH4 – 32-bit SE: ARM, POWER, X86, Z – 64-bit SE: POWER, X86, Z – Hard real-time (RTSJ compliant): IA32 – COBOL, PL/I, COBOL Automatic Binary Optimizer: Z – Z binary emulator: X86, P • Performance metrics that have been or are actively tracked : – Latency (elapsed time) – Throughput (operations / sec) – Start-up time – Ramp-up time – CPU consumption – Resource consumption at idle – Compilation time – Memory footprint – JIT library size – Incremental pauses • Hardware exploitation highlights: – Efficient CPU instruction sequences – Managing different kinds of hardware registers – Exploiting hardware data type support – Cryptographic, compression acceleration – Character conversion loop recognition and acceleration – Atomic locking and other synchronization optimization – Simultaneous Multi Threading – Transactional Memory – SIMD (Single instruction multiple data) – GPU (Graphics processing unit)
  • 6. 6 On the track: performance keeps going up! Java6 (SR16 FP4) Java 6.1 (SR8 FP4) Java 7 (SR9) Java 7.1 (SR3) Java 8 (SR1) 0 2000 4000 6000 8000 10000 12000 Java 6.0.16.4 Java 6.1.8.4 Java 7.0.9.0 Java 7.1.3.0 Java 8.0.1.0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Java 6.0.16.4 Java 6.1.8.4 Java 7.0.9.0 Java 7.1.3.0 Java 8.0.1.0 1.53X 2.00X 2.29X 2.76X 1.35X 1.60X 1.76X 1.96X Apache Spark 1.4 Databricks 1/geometric mean Daytrader online stock trading application Throughput (ops/sec)
  • 7. 7 • J9 and Testarossa have played critical role advancing Java performance – Competitive, often industry-leading, performance for 11 years now – You have benefited from competitive pressure on your JDK even if you don’t actually use the IBM SDK for Java • J9 and Testarossa are now being open sourced You all benefit from it!
  • 8. 8 IBM SDK for Java built from open source Open JDK HotSpot Eclipse OMR Open JDK Open J9 OMR Open JDK Open J9 OMR Proven adaptable technology in the open for rapid innovation and collaboration across multiple language communities Open JDK IBM SDK for Java Java community open innovation and collaboration, deep platform exploitation for X86 & IBM hardware platforms (OpenPOWER, Linux ONE) Ruby? OMR Communities Beyond Java COBOL PL/IEmulator Python? OMR JS? OMR Swift? OMR … Long term support, quick response for problems, and other forms of IBM customer specific engagement + IBM isms
  • 9. 9 How did we create Eclipse OMR?
  • 10. 10 Start from IBM J9 Java Runtime J9 Java Execution Environment J9 Java Platform Abstraction Layer J9 Java Garbage Collector J9 Java Diagnostic and Monitoring Services Source Code Bytecode/AST Compiler J9 Java Just-In-Time Compiler Interpreter Java Source J9 Java Bytecode Compiler J9 Java Bytecode Interpreter
  • 11. 11 Refactor “Java”-ness into a Glue layer that adds language specifics to each core component J9 Java JIT Compiler Glue J9 Java Execution Environment OMR Platform Abstraction Layer OMR Garbage Collector OMR Diagnostic and Monitoring Services Source Code Bytecode/AST Compiler Interpreter Java Source J9 Java Bytecode Compiler J9 Java Bytecode Interpreter J9 Java Diagnostic and Monitoring Glue J9 Java GC Glue OMR Just in Time (JIT) Compiler
  • 12. 12 Form Eclipse OMR around core components OMR Platform Abstraction Layer OMR Garbage Collector OMR Diagnostic and Monitoring Services OMR Just in Time (JIT) Compiler
  • 13. 13 http://www.eclipse.org/omr https://github.com/eclipse/omr https://developer.ibm.com/open/omr/ Dual License: Eclipse Public License V1.0 Apache 2.0 Users and contributors very welcome https://github.com/eclipse/omr/blob/master/CONTRIBUTING.md Eclipse OMR Created March 2016
  • 14. 14 port platform abstraction (porting) library thread cross platform pthread-like threading library vm APIs to manage per-interpreter and per-thread contexts gc garbage collection framework for managed heaps compiler extensible compiler framework jitbuilder WIP project to simplify bring up for a new JIT compiler omrtrace library for publishing trace events for monitoring/diagnostics omrsigcompat signal handling compatibility library example demonstration code to show how a language runtime might consume OMR components, also used for testing fvtest language independent test framework built on the example glue so that components can be tested outside of a language runtime, uses Google Test 1.7 framework + a few others ~800KLOC at this point, more components coming! OMR components
  • 15. 15 port platform abstraction (porting) library thread cross platform pthread-like threading library vm APIs to manage per-interpreter and per-thread contexts gc garbage collection framework for managed heaps compiler extensible compiler framework jitbuilder WIP project to simplify bring up for a new JIT compiler omrtrace library for publishing trace events for monitoring/diagnostics omrsigcompat signal handling compatibility library example demonstration code to show how a language runtime might consume OMR components, also used for testing fvtest language independent test framework built on the example glue so that components can be tested outside of a language runtime, uses Google Test 1.7 framework + a few others ~800KLOC at this point, more components coming! OMR components IBM Contributed 500KLOC of Testarossa September 17, 2016
  • 16. 16 • TR JIT design principles • How compilation works • AOT compilation • Wrap-up Rest of the talk is on Testarossa JIT
  • 17. 17 Be transparent Users shouldn’t be aware of the JIT (except that the application runs a lot faster!) JIT design principle #1
  • 18. 18 Let the interpreter handle the hard stuff Optimize to target the top 75% ish of cases with a “simple” solution JIT design principle #2
  • 19. 19 Pay attention to the costs Overheads can very easily trump benefits Profile data occupies space Consider what will happen at scale (10K+ classes) JIT design principle #3
  • 20. 20 Use the right optimization tool for the job Prove when you can prove easily Guard when you can’t prove or can’t prove easily Speculate appropriately for the bias JIT design principle #4
  • 21. 21 Compilers can do amazing things Remember the “unreadable” list of highlight technologies from slide 5! Many items on that list did not exist or had never been done in a production runtime system before Java Also keep in mind
  • 22. 22 Compilers are not all powerful Can’t change algorithms Engineering constraints can take away a lot of options Also keep in mind
  • 23. 23 “JIT as optimizer for interpreter” is reasonable starting point But it’s not how either production Java runtime compiler evolved IMO interpreter should focus on getting it right without being really slow JIT compiler should make it fast but stay as simple as possible Also keep in mind
  • 24. 24 So how does it work?
  • 25. 25 • Methods almost always start out running in interpreter – Interpreter simulates the Java Virtual Machine – Uses a ”program counter” (pc) to point at the current bytecode – Conceptually just a loop loading and simulating bytecode at *pc do { switch (*pc) { … case BCdup : t=pop();push(t); push(t); pc++; break; … } } while (!finishedProgram()); J9 JVM: methods start off interpreted
  • 26. 26 • Remember: the interpreter has to handle all the hard stuff! • It is a switch loop – But uses computed goto’s – Deal with exceptions – Deal with all the various things that can go wrong – Does some profiling – Counts method invocations to trigger JIT compilations – … • More info in Dan Heidinga’s talk tomorrow on the J9 interpreter Tuesday @ 12:30 in Continental Ballroom 1/2/3 OK, it’s more complicated than that
  • 27. 27 Interpreter helps JIT compiler do a good job Thread Bytecode Interpreter VM State Native State Java Stack pc Method Bytecodes … 15: ificmpne 29 … 23: instanceof … 29: invokev <C.foo> … sp J9 JVM
  • 28. 28 Interpreter collects profiles Thread Bytecode Interpreter VM State Native State Java Stack pc Method Bytecodes … 15: ificmpne 29 … 23: instanceof … 29: invokev <C.foo> … sp Thread Profile Buffer - Branch directions - Actual classes - Invocation targets Per thread buffer: no mutex! Buffer is an event trace method,bytecode locator data (e.g. receiver class) Very easy to store and bump cursor into the buffer J9 JVM
  • 29. 29 Threads collect into buffer until full Thread 1 Profile Buffer A Thread 2 Profile Buffer B Thread 3 Profile Buffer C Thread 4 Profile Buffer D J9 JVM
  • 30. 30 When buffer fills, put onto a queue Profile Buffer Queue A Thread 1 Profile Buffer E Thread 2 Profile Buffer B Thread 3 Profile Buffer C Thread 4 Profile Buffer D J9 JVM Only one queue, so needs a mutex But only held when buffers fill and only to enqueue/dequeue Impact tunable with buffer size Trade-off: lag for profile data, footprint
  • 31. 31 Enqueue, allocate new buffer, keep going Profile Buffer Queue J9 JVM A C Thread 1 Profile Buffer E Thread 2 Profile Buffer B Thread 3 Profile Buffer F Thread 4 Profile Buffer D Queue decouples profile collection from profile aggregation Pool of empty buffers reduces allocation stress
  • 32. 32 Another thread processes buffers Profile Buffer Queue C Buffer Processing Thread Aggregated Profile Data Structure Thread 1 Profile Buffer G Thread 2 Profile Buffer B Thread 3 Profile Buffer F Thread 4 Profile Buffer D J9 JVM A E Iterate through trace, adding entries one by one to profile
  • 33. 33 JIT threads read&write aggregated profile Profile Buffer Queue Buffer Processing Thread Aggregated Profile Data Structure JIT Thread 1 JIT Thread N Thread 1 Profile Buffer G Thread 2 Profile Buffer B Thread 3 Profile Buffer F Thread 4 Profile Buffer D J9 JVM E C Aggregated profile also requires a mutex!
  • 34. 34 1. Invocation count while interpreted used for initial compilation • When a method’s count reaches zero, trigger method compile 2. Sampling thread • Periodically (10ms or so) ask active threads to sample themselves • If a method catches enough samples over time: trigger method recompile • Samples in interpreted methods dramatically reduce invocation count How do those JIT threads get work?
  • 35. 35 • “trigger” just means to enqueue a method on compilation queue – Based on current conditions, select an optimization plan – May already be queued, may be queued with different plan • Testarossa compilations are (mostly) asynchronous – Application thread continues running after enqueing the method • Testarossa can employ multiple compilation threads – Dynamically resized pool based on compilation load, # cores, configuration (e.g. how important is memory vs. ramp-up speed?) Triggering a compilation
  • 36. 36 • Compiler thread dramatically oversimplified algorithm: while (!done) { method = getNextMethodFromQueue(); if (sharedClassesCache->hasAOTCompiledMethod(method)) … = loadAotCompiledMethod(method); else … = compile(method); // may store AOT code to cache commitCompiledMethod( … ); } • You have questions, I know… What does a compilation thread do?
  • 37. 37 • Compiler thread dramatically oversimplified algorithm: while (!done) { method = getNextMethodFromQueue(); if (sharedClassesCache->hasAOTCompiledMethod(method)) … = loadAotCompiledMethod(method); else … = compile(method); // may store AOT code to cache commitCompiledMethod( … ); } • You have questions, I know… – Let’s start by explaining the compiler itself The real work: the compiler thread
  • 38. 38 ARM Testarossa Compilation Process Optimizer Analyses and Optimizations cold warm hot FSDscorching AOT IL Generation x86 POWER Z Code Generators Runtime Environment/ Configuration •Options •Object Model •Memory •Threading •Tracing codeMetadataRuntimeRT Helpers very hot profiling Profile Manager Hardware counters Sampling Thread Interpreter Profile Info JIT Profile Info Profiler
  • 39. 39 Convert the method’s bytecodes to Testarossa’s Intermediate Language (IL) Have slides but not enough time L Come talk to me if you’re interested! First step: IL Generation
  • 40. 40 • IL generator focuses on correctness • Strive to avoid complexity for performance – *striving* not always successful • Rely on the optimizer to make it fast Second Step: Make the IL Better
  • 41. 41 • About 70 basic optimizations • Three high level categories: 1. Traditional compiler optimizations requiring little adaptation for Java e.g. reaching definitions, block ordering, expression simplification, … 2. Traditional compiler optimizations with Java adaptation e.g. inlining, partial redundancy elimination, loop versioning, auto parallelization (SIMD, GPU), … 3. Optimizations developed for Java e.g. escape analysis, monitor coarsening, async check insertion, … Testarossa Optimizations
  • 42. 42 • Strategy is just a sequence of individual optimizations – Contain groups which can be repeated or looped – Opts can be conditional on earlier opts finding/creating opportunities • 6 strategies with increasing compilation cost & expected payback 1. NoOpt not used by default 2. Cold initial compile during startup 3. Warm initial compile after startup or upgrade 4. Hot methods consuming > ~1% of CPU 5. Very Hot with Profiling collect profile before a scorching compile 6. Scorching methods consuming > ~12.5% of CPU Optimization Strategies
  • 43. 43 • Testarossa has 4 main code generators: – X86 (32- and 64-bit) – POWER (32- and 64-bit, BE and LE) – Z (IBM mainframe) (31-bit and 64-bit) – ARM 32-bit • Responsible for converting Testarossa IL into native instructions – Generate fast instruction sequences for current processor – Efficient assignment of registers – Layout of native stack frame – Other very detailed things based on intricate workings of processors Third step: code generation
  • 44. 44 Such a simple idea: Store JIT compiled code then “Just” load into another JVM AOT compilation for Java
  • 45. 45 Compiled code is for method, and Methods come from classes… But it’s not so simple
  • 46. 46 But what is a ”class”? C B A I1 I3 I2 A implements I1, I2 { … } B extends A { … } C extends B implements I3 { … }
  • 47. 47 Inside a JVM C B A I1 I3 I2 Compiler and applications work on objects of resolved classes e.g. C objects: embed a B which embeds an A And C implements I3 and I1, I2 class A class B class C
  • 48. 48 Outside a JVM: sea of class files C extends a class called “B” and implements an interface called “I3” B extends a class called “A” A implements interfaces called “I1” and “I2” I1 I3 I2 src/directory1/ A.class I1.class I2.class src/directory2/ A.class I1.class I2.class src/directory3/ B.class C.class src/directory4/ C.class I3.class
  • 49. 49 • Class files can change • Classpath can change • Class files can be added or removed ”Class” identity a very complicated notion
  • 50. 50 • Class files can change • Classpath can change • Class files can be added or removed • Class loader object used to load the class can change – Ever heard of an application class loader object outside of a JVM? – Class loader objects (like other objects) don’t exist outside the JVM – Serialization doesn’t help: what to deserialize to replace what object? • Two class loaders can even load the exact same class files to create two unique classes in a single JVM • All perfectly valid scenarios under the JVM specification And it even gets worse (!)
  • 51. 51 Seems grim, what can we do?
  • 52. 52 • We did it this way for a long time (embedded space and for WebSphere Real Time) – AOT code stored alongside binary loadable version of class files called JXEs (kind of like a jar file) • Class references aren’t the only problem though – Compiled code also directly references addresses in the JVM – e.g. Pointers to constant pools, pointers to ”ROM” parts of classes (see Dan Heidinga’s talk!) – e.g. Pointers to helper functions in JIT runtime • Code generator also builds relocation records alongside the code – e.g. at code offset 0x208 is the address of the compiled method’s class’s constant pool – e.g. at code offset 0x4C3 is the 4 byte relative address of JIT helper jitNewObject() • At class load time, process relocations to bind code into current JVM process First cut: treat everything as unresolved
  • 53. 53 • Our shared classes cache (SCC) debuted in Java 5.0 – Shared memory region mapped into every JVM process – Accelerates start-up by speeding up class loading – By itself, accelerated app server start-up by 20-30% • Also created an opportunity to use AOT code “dynamically” – SCC handles part of problem: “is this the same class I had before” – So: AOT compile in first JVM run, store into SCC, load in other JVMs • For Java 6, we revamped our AOT compilation story – Made some improvements in code quality – Provide another roughly 20% start-up improvement Next goal: use AOT to accelerate startup
  • 54. 54 Simplified class loading, no shared cache C ROMClassC.class JVM Process A class B { … }; class C extends B { … }; B ROMClassB.class B RAMClass C RAMClass
  • 55. 55 Simplified class loading, no shared cache C ROMClassC.class JVM Process A class B { … }; class C extends B { … }; B ROMClassB.class B RAMClass C RAMClass C ROMClass JVM Process BB ROMClass B RAMClass C RAMClass
  • 56. 56 Simplified class loading with shared cache C.class JVM Process A class B { … }; class C extends B { … }; B.class B RAMClass C RAMClass Shared Cache C ROMClass B ROMClass
  • 57. 57 Simplified class loading with shared cache C.class JVM Process A class B { … }; class C extends B { … }; B.class B RAMClass C RAMClass JVM Process BB RAMClass C RAMClass Shared Cache Shared Cache C ROMClass B ROMClass C ROMClass B ROMClass Memory mapped
  • 58. 58 How did we make AOT better with the shared class cache?
  • 59. 59 • Start-up scenario: usually running the same code over and over – Anything you learn in first run *probably* applies in second run too • Some optimizations are clearly ok for AOT: – e.g. Block ordering uses block frequencies to rearrange code nicely – Different profile in second run? Ok, it runs a bit more slowly – But usually, the profile is incredibly similar • Can also rely on some tricks: – Any information local to this method or this class (fields, methods) – Shared cache gave us a way to identify and check other methods Dynamic AOT to accelerate start-up
  • 60. 60 • Some direct calls can just be inlined – Direct call to, say, this class’s constructor • Inline more direct calls using virtual guard infrastructure – AOT compile optimistically generates guard as a NOP – AOT load evaluates the guard at AOT load time (via relocation record) – Turn NOP into a jump to an unresolved call if relocation record fails • Shared classes cache helps to inline virtual calls from “this” – Can reason about the vtable of the class of the compiled method Inlining for AOT methods
  • 61. 61 Using the vtable for virtual “this” calls Class C J9Method ROMMethod B.foo() class B { public void foo() {…} } class C extends B { void bar() { this.foo(); } } Resolved “B.foo()” Foo() from B.class Resolved C vtable JVM Process 1
  • 62. 62 No SCC: are B.foo and B’.foo same? No idea! Class C J9Method ROMMethod B.foo() Resolved “B.foo()” Foo() from B.class Resolved C vtable JVM Process 1 Class C’ J9Method ROMMethod B’.foo() Resolved “B’.foo()” Foo() from B’.class Resolved C’ vtable JVM Process 2 class B { public void foo() {…} } class C extends B { void bar() { this.foo(); } }
  • 63. 63 SCC : B.foo, B’.foo same? Can answer! Class C J9Method ROMMethod B.foo() Resolved “B.foo()” Foo() from B.class Resolved C vtable Class C’ J9Method B’.foo() Resolved “B’.foo()” Resolved C’ vtable JVM Process 1 JVM Process 2 ROMMethod Foo() from B.class SCC SCC Same Offset! class B { public void foo() {…} } class C extends B { void bar() { this.foo(); } }
  • 64. 64 • ROMMethod includes the bytecodes – If class’s vtable has a J9Method with the right ROMMethod, then the right bytecodes will be inlined – Still need to be careful about other code aspects e.g. field offsets – But you know you got the same method implementation • Just like the JIT: – Need to check to make sure there isn’t another possible target – Need to register runtime assumptions against future class loads • Still wrap the inlined code in a guard resolved at AOT load time – If not the right or only target: back off to a virtual invocation Only needs to be same “enough”
  • 65. 65 • Profile guard: C.method profiled as most common target if (o.clazz == <common receiver class C address>) { /* inlined C.method() */ } else o.method(); • C needs to be a resolved class • Typically used for interface invokes – Not as straight-forward as vtable But we needed something stronger
  • 66. 66 • List of super classes and implemented interfaces for a class – Every one must have a ROMClass in the shared cache – AOT compiles record “validation relocation” for every referenced resolved class (offset of a class chain in the SCC) – AOT loads walk class chains in parallel with resolved classes in current JVM – Anything not right: bail and requeue method as JIT compile • Still one challenge though: – How to look up the resolved class pointer for “some class” ? – Need a class loader to do that! We implemented “class chains”
  • 67. 67 How can you find a class loader object in this JVM that corresponds to the “same” class loader object from another JVM? Exercise for the audience
  • 68. 68 How can you find a class loader object in this JVM that corresponds to the “same” class loader object from another JVM? I don’t have time today to tell you how we did it L Come talk to me if you’re really interested! Exercise for the audience
  • 69. 69 • Modularity work in JDK9 opening up interesting opportunities • Possibility to AOT compile entire modules • Sounds awesome but not a straight-forward win: – Typically don’t know much about execution profile at load time – AOT code is generally much larger than bytecodes (10X footprint) – Generality/flexibility of JDK libraries could hurt us if not careful • Locales, etc. not used in all runs but maybe in some run • Some interesting new possible optimization opportunities – But remember the JIT design principles! Where do we go with AOT?
  • 70. 70 • IBM Runtimes are going open source – 800KLOC already contributed to Eclipse OMR project for all runtimes – Working on the remainder in and around Java 9 development – You’re welcome to join us at Eclipse OMR and, later, Open J9 ! – Any feedback welcome! • Testarossa is a high performance, modular compiler technology – 500KLOC now open sourced at Eclipse OMR – Provides steady and significant performance uplift (through effort!) – Around 70 optimizations with code generators for 4 hardware platforms – Deep dove into Testarossa’s AOT compilation technology Wrap Up
  • 71. 71 • Mark Stoodley mstoodle@ca.ibm.com @mstoodle • Eclipse OMR www.eclipse.org/omr www.github.com/eclipse/omr • Other J9 developer talks at Java One – Dan Heidinga on Tuesday at 2:30 in Continental Ballroom 1/2/3 – Charlie Gracie on Wednesday at 10am in Golden Gate 2/3 • Visit me and other J9 devs at the IBM Booth – I’ll be there tomorrow morning at 9:30am • I will also be at the Eclipse booth Tuesday at about 4pm - 5:30pm Thank You!
  • 72. 72 Legal Notice IBM and the IBM logo are trademarks or registered trademarks of IBM Corporation, in the United States, other countries or both. Java and all Java-based marks, among others, are trademarks or registered trademarks of Oracle in the United States, other countries or both. Other company, product and service names may be trademarks or service marks of others. THE INFORMATION DISCUSSED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION, IT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, AND IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, SUCH INFORMATION. ANY INFORMATION CONCERNING IBM'S PRODUCT PLANS OR STRATEGY IS SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.