Java bytecode is the form of instructions that the JVM executes.
A Java programmer, normally, does not need to be aware of how Java bytecode works.
Understanding the bytecode, however, is essential to the areas of tooling and program analysis, where the applications can modify the bytecode to adjust the behavior according to the application's domain. Profilers, mocking tools, AOP, ORM frameworks, IoC Containers, boilerplate code generators, etc. require to understand Java bytecode thoroughly and come up with means of manipulating it at runtime.
Each and every of these advanced features of what is nowadays standard approaches when programming with Java require a sound understanding of the Java bytecode, not to mention completely new languages running on the JVM such as Scala or Clojure.
Bytecode manipulation is not easy though ... except with Javassist.
Of all the libraries and tools providing advanced bytecode manipulation features, Javassist is the easiest to use and the quickest to master. It takes a few minutes to every initiated Java developer to understand and be able to use Javassist efficiently. And mastering bytecode manipulation, opens a whole new world of approaches and possibilities.
2. 2
» Java bytecode is the form of instructions that the JVM executes.
» A Java programmer, normally, does not need to be aware of how Java bytecode
works.
» Understanding the bytecode is essential for tooling and program analysis,
where the applications can modify the bytecode to adjust the behavior
according to the application's domain.
+ Profilers,
+ Mocking tools,
+ AOP,
+ ORM frameworks,
+ IoC Containers,
+ Boilerplate code generators,
+ etc.
» Bytecode manipulation is not easy though ... except with Javassist.
+ Simple, efficient and natural way
+ A few minutes to every initiated Java developer to understand and master
» And mastering bytecode manipulation, opens a whole new world of approaches
and possibilities.
Java Bytecode Manipulation
3. 3
» Boilerplate Code Generation
» Lightweight and simple IoC
Container
Objectives for today
Present
Javassist
and bytecode
manipulation
Introduce it
in the light of
2 use cases
4. 4
Bytecode Manipulation ?
» Bytecode manipulation consists in modifying the classes - represented by
bytecode - compiled by the Java compiler, at runtime.
Java Bytecode ?
» Java source files are compiled to Java class files by the Java Compiler. These
Java classes take the form of bytecode. This bytecode is loaded by the JVM to
execute the Java program.
Bytecode Manipulation
5. 5
1. Introduction
2. Javassist
3. Java Instrumentation framework and applications
4. Use Case A:
Boilerplate Code Generation and Project Lombok
5. Use Case B:
Simple and lightweight IoC Container
6. Conclusion
Agenda
8. 8
» In this article we'll dig into the library Javassist which is a bytecode manipulation
framework
» But before, let's describe two different, unrelated but complementary
techniques: Type Introspection and Runtime Reflection
Runtime Reflection
11. 11
1. Because Javassist attempts to keep an API as close as possible to the Java
Runtime Reflection API, as a way to appear as natural as possible to Java
developers.
2. This is maybe more important, because behaviour injected in Java Classes
using bytecode manipulation is not known by the compiler.
Thus, it is sometimes only available through runtime reflection.
Why important ?
12. 12
3. Bytecode Manipulation
» Bytecode manipulation allows the developer to express instructions in a format
that is directly understood by the Java Virtual Machine, without passing from
source code to bytecode through compiler.
» Bytecode is somewhat similar to assembly code directly interpretable by the
CPU but
+ bytecode is interpreted by a Virtual Machine, the JVM,
+ much more understandable that assembly code.
13. 13
Bytecode Manipulation typical use cases:
» ORM frameworks such as Hibernate use bytecode manipulation to inject, for
instance, relationship management code (lazy loading, etc.) inside mapped
entities.
» FindBugs inspects bytecode for dynamic code analysis
» Languages like Groovy, Scala, Clojure generate bytecode from different source
code.
» IoC frameworks such as Spring use it to seamlessly weave your application
lifecycle together
» Language extensions like AspectJ can augment the capabilities of Java by
modifying the classes that the Java compiler generated
» etc.
Use Cases
14. 14
» Write your own compiler for any kind of new and crazy language
» Generate on the fly sub-classes of already loaded classes and use them
instead of original classes to get additional behaviour
» Write an instrumentation agent that plugs right into the JVM and modifies
behaviour of classes before they are loaded by the classloader
» etc.
Several ways
Our focus for today
16. 16
» ASM s a project of the OW2 Consortium. It provides a simple API for
decomposing, modifying, and recomposing binary Java classes. ASM exposes
the internal aggregate components of a given Java class through its visitor
oriented API. ASM also provides, on top of this visitor API, a tree API that
represents classes as object constructs. Both APIs can be used for modifying
the binary bytecode, as well as generating new bytecode
» BCEL provides a simple library that exposes the internal aggregate components
of a given Java class through its API as object constructs (as opposed to the
disassembly of the lower-level opcodes). These objects also expose operations
for modifying the binary bytecode, as well as generating new bytecode (via
injection of new code into the existing code, or through generation of new
classes altogether).
» CGLIB is a powerful, high performance and quality Code Generation Library, it is
used to extend JAVA classes and implements interfaces at runtime. CGLIB is
really oriented towards implementing new classes at runtime, as opposed to
modifying existing bytecode such as other libraries.
» Javassist is a Java library providing a means to manipulate the Java bytecode of
an application. In this sense Javassist provides the support for structural
reflection, i.e. the ability to change the implementation of a class at run time.
Most common libraries (2)
18. 18
From the Javassist web site:
» Javassist (Java Programming Assistant) makes Java bytecode manipulation
simple.
» It is a class library for editing bytecodes in Java; it enables Java programs
+ to define a new class at runtime and
+ to modify a class file at loading time (when the JVM loads it)
» Unlike other similar bytecode editors, Javassist provides two levels of API:
+ source level and
+ bytecode level.
» If the users use the source-level API, they can edit a class file without
knowledge of the specifications of the Java bytecode.
+ The whole API is designed with only the vocabulary of the Java language. You can even
specify inserted bytecode in the form of source text; Javassist compiles it on the fly.
» On the other hand, the bytecode-level API allows the users to directly edit a
class file as other editors.
Javsssist
19. 19
+ to define a new class at runtime and
+ to modify a class file at loading time
» The Linkage problem !
+ Once a class has already been
loaded, changing it would result in a
Linkage Error (unless the JVM is
launched with the JPDA [Java
Platform Debugger Architecture]
enabled, which would make a class
dynamically reloadable).
+ Interestingly, Javasssist is perfectly
able to modify a class long after the
application has started as long as
that specific class has not been
loaded.
Runtime vs. Loading Time
Loading
Time
Runtime
20. 20
» Javassist : a high level API around classes, methods, fields, etc.
+ making it as easy as possible to change the implementation of existing classes
+ even implement completely new classes
API
21. 21
ClassPool
» This program first obtains a ClassPool object, which controls bytecode
modification with Javassist.
+ The ClassPool object is a container of CtClass objects representing class files.
+ It reads a class file on demand for constructing a CtClass object and records the
constructed object for responding later accesses.
CtClass
» The CtClass object obtained from a ClassPool object can be modified.
» In the example above, it is modified so that the superclass of test.Rectangle is
changed into a class test.Point.
+ This change is reflected on the original class file when writeFile() in CtClass() is finally
called.
Introduction Example
22. 22
» writeFile() translates the CtClass object into a class file and writes it on a
local disk.
» Javassist also provides a method for directly obtaining the modified bytecode:
(Bear in mind that this is especially useful when implementing a Java agent)
» You can directly load the CtClass as well:
» Finally, a modified class should be returned to the pool to make the enhanced
version available to the ClassLoader:
Saving a modified CtClass
23. 23
» To define a new class from scratch, makeClass() must be called on a
ClassPool.
» This program defines a class Circle including no members except those
inherited by the parent class Point.
» Member methods of Circle can afterwards be created with factory methods
declared in CtNewMethod and appended
to Circle with addMethod() in CtClass.
» makeClass() cannot create a new interface; makeInterface() in ClassPool can
do.
+ Member methods in an interface can be created with abstractMethod() in CtNewMethod.
Note that an interface method is an abstract method.
Defining a new class
24. 24
» Methods are represented by CtMethod objects.
+ CtMethod provides several methods for modifying the definition of the method
» Constructors are represented by their own type in Javassist: CtConstructor.
+ Both CtMethod and CtConstructor extends the same base class and have a lot of their
API in common.
» CtMethod and CtConstructor can be used to completely implement / rewrite a
constructor or a method from scratch.
+ They also provide methods insertBefore(), insertAfter(), and addCatch().
+ These are used for inserting a code fragment into the body of an existing method.
» When implementing or rewriting completely a method from scratch,
using CtNewMethod.make() is in my opinion the most convenient approach.
+ It enables the developer to implement a method by providing Java Source
Code syntax in a simple string.
Implementing / Modifying a class
30. 30
» Java 5 was the first version seeing the proper implementation of JSR-163 (Java
Platform Profiling Architecture) support including a bytecode instrumentation
mechanism through the introduction of the Java Programming Language
Instrumentation Services - JPLIS.
» At first that JSR only mentioned native (C) interfaces but evolved fast towards a
pretty convenient Java API.
» The key point of the JSR-163 is JVMTI.
+ JVMTI - or Java Virtual Machine Tool Interface - allows a program to inspect the state
and to control the execution of applications running in the Java Virtual Machine.
+ JVMTI is designed to provide an Application Programming Interface (API) for the
development of tools that need access to the state of the JVM.
+ Examples for such tools are debuggers, profilers or runtime boilerplate code generator.
JSR-163
31. 31
» The Java Instrumentation Framework was an interesting breakthrough since it
allowed, with the help of an agent, to modify the content of a class bytecode
inherent to the methods of a class in such a way as to modify its behavior at
runtime.
» The linkage problem
+ Javassist cannot modify a class after it has been loaded by a classloader ... as far as
this classloader is concerned.
+ Whenever one tries to modify a class already loaded by the referenced classloader, that
attempt to call pool.makeClass( ... ) will fail and complain that class is frozen (i.e.
already created via toClass().
+ Being able to do that would require to unload the class first from the reference
Classloader.
+ And that is really pretty difficult (not impossible) …
» The only (easy) way to overcome this problem is to change the class
implementation using bytecode manipulation before the class is loaded by any
Classloader.
+ And happily this is pretty easy using a Java Agent
Java Instrumentation Framework
32. 32
» In its essence, a Java agent is a regular Java class which follows a set of
strict conventions. The agent class must implement a
public static void premain(String agentArgs, Instrumentation inst)
method which becomes an agent entry point (similar to the main method for
regular Java applications).
» Once the Java Virtual Machine (JVM) has initialized, each
such premain(…) method of every agent will be called in the order the agents
were specified on JVM start.
When this initialization step is done, the real Java application main method will
be called.
Java Agents
33. 33
» A Java agent premain method takes the Instrumentation entry point -
class java.lang.instrument.Instrumentation - as argument.
» The most important API of the java.lang.instrument.Instrumentation class
is the method void addTransformer(ClassFileTransformer transformer);
» The ClassFileTransformer
interface defines one single
method
byte[] transform(byte[] …)
that is responsible to apply
transformations to a class being
loaded.
» The transform(...) method is
called for each and every class
being loaded by a classloader.
Behaviour of Agents
35. 35
» When running from the command line, the Java agent could be passed to JVM
instance using -javaagent argument which has following semantic -
javaagent:<path-to-jar>[=options].
» A java agent needs to be packaged in a jar file and that jar file needs to have a
specific and proper MANIFEST.MF file indicating the class containing
the premain method.
» A proper manifest file for the agent above should be packaged within the jar
archive containing the agent classes under META-INF/MANIFEST.MF and would
be as follows:
» Now let's imagine we invoke our agent on a simple program defined as follows:
Example (2) - Packaging
37. 37
4. Use Case A :
Boilerplate Code Generation
and Project Lombok
38. 38
» Project Lombok is a Boilerplate code generator
+ Addresses one of the most frequent criticism against java: the volume of boilerplate code
+ Boilerplate code : code that is repeated in many parts of an application with only slight
contextual changes and with little added value.
» Project Lombok reduces the need of some of the worst offenders by replacing
each of them with a simple annotation.
» Importantly in our context, Lombok doesn't just generate Java sources or
bytecode: it transforms the Abstract Syntax Tree (AST), by modifying its
structure at compile-time.
+ The AST is a tree representation of the parsed source code, created by the compiler,
similar to the DOM tree model of an XML file.
+ By modifying (or transforming) the AST, Lombok keeps the source code trim and free of
bloat, unlike plain-text code-generation.
+ Lombok's generated code is also visible to classes within the same compilation unit,
unlike direct bytecode manipulation.
Project Lombok
39. 39
» Let's see an example. Imagine the following Java POJO:
» Typical boilerplate code required when considering such a POJO are:
+ Getters and Setters for all private fields, making them JavaBean properties
+ A nice toString method giving the values of its properties when an object is output on
the console
+ Consistent hashCode and equals methods enabling to compare and manipulate two
different objects with same values
+ A default constructor without any argument (Javabean standard)
+ An all args constructor taking all values as argument to build the instance
Example Class
41. 41
» We want getter(s) / setter(s) :
Without Lombok (2)
42. 42
» We want a toString() method
Without Lombok (3)
43. 43
» We want consistent equals() and hashCode() methods
Without Lombok (4)
44. 44
» No added value : an IDE can write this code for you !
» ratio of [Boilerplate code / Useful Code] of more than 1200% !
Without Lombok (5)
5 lines of code
4 fields
Initial Class
60 lines of code
4 fields
2 constructors
10 methods
Without Lombok
45. 45
» With Lombok, the class becomes:
» All these annotations are straightforward to understand
With Lombok (1)
46. 46
» Thanks to AST Transformation approach, really behaves as if all this boilerplate
code was actually written !
» Much better ratio of [Boilerplate code / Useful Code]
With Lombok (2)
5 lines of code
4 fields
Initial Class
10 lines of code
4 fields
5 annotations
With Lombok
48. 48
» The BCG - BCG for Boilerplate Code Generator - tool mimics Lombok and re-
implement two features of the Lombok feature set:
+ toString() method generation
+ property getters and setters generation
» BCG is a simple tool that uses Javassist and implements a Java agent.
» BCG is not a production tool or anything like it, it is really just a Javassist
example and intended to demonstrate how straightforward, simple and efficient
it would be to re-implement Lombok features using Javassist ...
+ ... should one want to do that, which is not likely since Lombok is working so cool and so
easily extendable.
» We will really only be mimicking project Lombok here using bytecode
manipulation.
+ We are not implementing these features the same way Lombok is doing.
+ Lombok is working at compile-time using AST Transformation.
+ We will be working at runtime using bytecode manipulation.
Use case A: generation of boilerplate code
49. 49
» We want to be able to implement transformers that take care of performing one
specific modification to target classes and activated by the presence of one
specific annotation on these classes.
» Key idea : implement a Java Agent that analyze each and every class just
before it is loaded by the classloader and verifies if this class needs to be
transformed.
» We want to implement Transformers that recognize classes declaring a specific
annotation and proceed with the transformation of these classes.
» We want the system to be easily extendable with new transformers.
Principle
53. 53
» Inversion of Control is a design pattern related to lifecycle management of
components in an application benefiting from a services architecture.
» In such an application, business components are usually implemented in the
form of various services : business services, business managers, DAOs, etc.
+ The main class delegates specific business concerns to business services,
+ which delegate finer aspects in their turn to managers,
+ which further delegate various business of technical aspects to smaller managers, or
DAOs, adapters. etc.
» Managing the construction and instantiation of these services is called
components lifecycle management.
» Very often, business services are stateless components.
» Traditionally, for a very long time these stateless services have been
implemented as singletons.
+ this was a very convenient approach since the main singleton simply needs to get the
other singletons it was using,
+ which in turn simply needed to get the other singletons they were using, and so on.
Application Lifecycle Management and Singletons
54. 54
The problem with Singletons
Difficult
to unit test
Hide
Dependencies
Promote Tight
Coupling
Violate SRP
55. 55
» Inversion of control is initially mostly an answer to this problem,
+ increase the modularity of the application
+ make it more extensible,
+ more importantly testable in an easier way by removing the strict dependencies between
components.
» Key idea: delegate lifecycle management and injection of dependencies to a
container
+ the container takes care of instantiating the components, managing their lifecycle in the
required scope, and injecting their dependencies at runtime.
» Injecting the dependencies at runtime, with a configurable approach, using a
configuration file, annotations or even a dedicated API, opens the possibility to
inject a different implementation of a service depending on the context, as long
as it respects the required interface.
+ Mock objects
+ Different Iand prod implementations
+ Etc.
Inversion of Control
56. 56
IoC Container
» With IoC, a container, called
lightweight container - as opposed to
Java EE craps that are very heavy
(and very bad) containers - takes care
of instantiating and managing the
lifecycle of the components as well as,
more importantly, injecting the
dependencies in every component.
» in a usual application, the lifecycle of
components starts with a main
component (or class) that either
creates the other services it requires
or get their singletons.
» These other components, in their turn,
create or get references on their own
dependencies, and so on.
57. 57
» The Spring Framework is an application framework and inversion of control
container for the Java platform. The core of spring is really about IoC and
components management but nowadays there is a complete ecosystem of tools
and side frameworks around spring core aimed at developing web application,
ORM concerns, etc.
» The Pico Container is a very lightweight IoC Container and only that. Unlike
spring, it is designed to remain small and simple and targets only IoC concerns,
nothing else. It is not heavily maintained.
» Apache Tapestry is an open-source component-oriented Java web application
framework conceptually similar to JavaServer Faces and Apache Wicket. It
provides IoC concerns in addition to the web application framework.
» Google Guice is an open source software framework for the Java platform
released by Google. It provides support for dependency injection using
annotations to configure Java objects.
Various Frameworks
59. 59
» Implementing Dependency Injection is actually a state-of-the-art use case for
Javassist and a nice way to present the possibilities and whereabouts of
bytecode manipulation.
» We'll see now how to use Javassist in the light of a concrete use case: the
implementation in a little more than 300 lines of code of a lightweight, simple but
cute IoC Container: SCIF - Simple and Cute IoC Framework.
Use Case B : Implementing an IoC Container
60. 60
» SCIF - the system we want to build - is an MVP
» We want it to implement Dependency Injection in its simplest form:
+ Services are managed by the framework and stored in a Service Registry
+ Services should declare the annotation @Service to be discovered by the framework.
The framework searches for services declaring this annotation in the classpath.
+ Dependencies are identified in services using the annotation @Resource. The
framework analyze services to discover about their dependencies at runtime.
○ If @Resource is declared on a field, the framework injects the dependency directly, at build time.
○ If @Resource is declared on a getter, the framework uses bytecode manipulation to override the
getter in a subclass and implement lazy loading of the dependency.
+ In case of getter (property) injection instead of field injection, SCIF is forced to generate
a sub-class of the initial class and override the getter in that sub-class to implement lazy-
loading.
Principle
65. 65
» Bytecode manipulation is a lot of fun and opens a whole new world of
possibilities on the JVM.
» It's the only way to implement advanced tooling such as IoC Containers, ORM
frameworks, boilerplate code generators, etc.
Normally, bytecode manipulation is something rather pretty difficult to achieve ...
except with Javassist.
» Javassist makes bytecode manipulation so easy and straightforward. The ability
to write dynamically in simple strings actual java source code and add it on the
fly as bytecode to classes being manipulated is striking.
» Javassist is in my opinion the simplest way to perform bytecode manipulation in
Java.
» My own use cases:
+ ENTER / LEAVE
+ jt-property …
The problem here is that one cannot unload a single class from a ClassLoader. A class may be unloaded if it and its ClassLoader became unreachable but since every class refers to its loader that implies that all classes loaded by this loader must have to become unreachable too. Of course one can (re-)create the class using a different ClassLoader but that would require to make the whole program use that new Classloader and this becomes fairly complicated. At the end of the day, that would well require reloading the whole application and initializing everything all over again. This makes no sense.
Dependencies are hidden
In using the Singleton pattern, you use a global instance and hide the dependencies.
The dependencies are hidden inside your code and are not exposed through interfaces.
Doens't enforce layering
Increases tight coupling
Makes it too easy to get references on services
The Singleton design pattern promotes tight coupling between the classes in your application. Tight coupling increases the maintenance cost as maintenance and code refactoring becomes difficult. The reason is that changes to one component in your application is difficult as it would affect all other components that are connected to it.
Unit tests become difficult
This increased coupling due to usage of the Singleton design pattern makes creating fakes during unit tests rather difficult. In other words, usage of the Singleton design pattern makes your life painful when writing unit tests since it becomes very difficult to identify the dependency chains so that the components in your application can be unit tested properly. Most importantly, you would need to use static methods when implementing the Singleton design pattern. Static methods make unit testing difficult since you cannot mock or stub.
Violates the Single Responsibility Principle
Another point to note here is that the Singleton design pattern violates the Single Responsibility Principle since the objects control how they are created and manage their life-cycle. This clearly contradicts the Single Responsibility Principle which states that a class should have one and only one reason for change.
Inversion of Control and Dependency Injection are two different things - and yet strongly related to each other - often confused in some documentation:
Inversion of Control - IoC : is the name of the design pattern, the approach. It is considered a design pattern, which, in my opinion, is wrong! IoC is an architecture pattern. But yeah that is really no big deal.
Dependency Injection - DI : is the name of a technique, a mechanism on which IoC often relies to take place. It consists in injecting the components required by a specific component at runtime, based on some configuration rules. DI is really just one aspect of IoC.