Jamaica: The Java Virtual Machine (JVM) Macro AssemblerBy James Jianbo Huang March 2004 non-printer versionAbstract Jamaica, the JVM Macro Assembler, is an easy-to-learn and easy-to-use assembly language for JVM bytecode programming. It uses Java syntax to define a JVM class except for the method body that takes bytecode instructions, including Jamaica's built-in macros. In Jamaica, bytecode instructions use mnemonics and symbolic names for all variables, parameters, data fields, constants and labels. Jamaica is a simplified JVM assembly language. It does not support inner classes. Variables are all method-wide and are strongly-typed. Jamaica is a language facade for a Java class creation API, Why Jamaica? Even with the rigid JVM architecture and verification, creating JVM classes
at bytecode level is still highly risky and error-prone. With Jamaica, you can quickly
experiment dynamically creating classes; once done, mechanically convert the Jamaica source
code into
1. Introduction
Jamaica, or JVM Macro Assembler, is an assembly language for JVM bytecode programming. It uses syntax that is almost identical to Java language except for the method body, where the program is written in bytecode instructions. Class fields, methods and local variables are all declared with Java syntax. Symbolic labels are used instead of absolute addresses, and variable and field names are used in the instructions rather than their indices. This makes it very easy to do JVM assembly programming. JVM, Bytecode Assembly Programming
The Java Virtual Machine (JVM for short) is a specification defined by Sun Microsystems. It is by no means a simple reflection of the Java programming language. It only understands a particular binary format, the class file format, which contains a symbol table (constant pool), data fields, methods with JVM instructions (so-called bytecodes), and other ancillary information. The Java virtual machine imposes strong format and structural constraints of a class file for the sake of security. Even though JVM is totally separate from the Java language, it is certainly designed with Java in mind. It has facilities to support all Java features such as various kinds of classes (including interface), synchronization instructions and Java data type support. Since class is the center of JVM, and the class structure is so well defined and closely reflects the needs of the Java programming language, this "CPU" is quite different from traditional ones. Even for JVM assembler programmers, there is not much one can do to control the structure, hence understanding the class file format doesn't really help much. It is quite easy to hide the complexity of enforcing the structural constraints through Java-like syntax and let the assembler to do the dirty work. The JVM Bytecode instructions ubiquitously use indices to reference fields, variables and constant pool entries. Jamaica exclusively uses symbolic names for all instructions. Therefore, programmers can focus on programming with JVM instructions, fields and variables. This is the design goal of Jamaica. Introduction to Jamaica
The Jamaica language is a language to specify JVM bytecode instructions in a Java-like class
structure that will be compiled into a JVM class. Its syntax is mostly the same as the Java
programming language, that is, the class, initialization, member declarations and variable
declarations are all the same as in Java. The Assembly language programs typically have a lot of patterns that are used repeatedly. Jamaica has defined a number of useful macros; hence "macro" in its name. Because it is strongly typed, uses of those macros greatly simplifies programmers' lives because many JVM instructions actually force to specify the type of its operands. The current version is a little shy of being a true "macro assembly language" as it does not support user-defined macros. Let's take a look at an example.
The code is self-explanatory, assuming you are somewhat familiar with the JVM bytecode
instructions as well as Java class format in general. Outside of Java methods, it is just in
Java syntax. Note that Java class names and types are used like in Java. Java built-in
classes in packages
Macros are cool. Look at Use of Jamaica
What's the use of Jamaica? One of the main purposes is to study the JVM in order to create
Java classes on the fly without compilation. As you will discover (if you haven't done that
yet), creating Java classes directly with bytecode instructions is very error-prone and
takes a lot of effort. Jamaica is a relatively high-level language for this purpose and lets
you focus on the class you want to create rather than the nitty-gritty details of class file
structure, so you can quickly experiment with your class creation. Jamaica is a language
facade for a Java class creation API, JVM Specification Conformance
Jamaica is not related to Java nor is obliged to support all the features of the JVM specification. It does not support inner classes. It does not allow reuse of variables with different types nor ranges. This means all variable slots are strongly typed, something that is not rigidly enforced by the JVM. All these are quite minor, and frankly, it is probably better off without those features after all. (But, opinions differ.) JVM bytecode instructions use index numbers such as constant pool entry numbers, field numbers and local operand stack slot numbers. Because Jamaica is a symbolic assembler, you can't specify instructions in their "raw" format, nor can you manipulate the constant pool entries. This is exactly the reason for this symbolic assembler. Jamaica Assembler and Tools
The Jamaica assembler is the java class
The most popular and convenient tool to inspect the generated class file is
Jamaica, being an assembler, does little code verification and optimization, because it uses
textual information for class creation as much as possible to avoid dependency of other Java
classes. Therefore, you can generate invalid code! You should always test it, one
way is to run the generated class even if it does not contain the start-up method
There are other tools for verifying and inspecting Java classes, such as utilities included in the Jakarta-BCEL package.
2. Lexical Rules of a Jamaica Source File
To avoid confusion, from this point on, we use "Java" for the Java programming language. The following are the lexical rules for Jamaica.
3. Defining Classes or Interfaces
These are the syntactic and semantic rules for defining a JVM class or interface.
It looks like lot of rules, but they are all intuitive if you know Java (who doesn't?) The following is an example.
Run it with the following command line, it generates file CSecondCls.class; move it to the right place in the classpath, and run it with the following result: % java com.judoscript.jamaica.Main CSecondCls.ja % mv CSecondCls.class xyz/ % java xyz.CSecondCls CSecondCls@3f5d07 getLong() = 0 getList() = null getInt() = 0 getSA() = null --------------- CSecondCls@3f5d07 getLong() = 100 getList() = [] getInt() = 4 getSA() = [Ljava.lang.String;@cac268 --------------- Class-Level Macros
Jamaica supports these class-level macros:
4. Method Body
In essence, Jamaica the language is almost identical to Java except for the content of the class method body. In Jamaica, bytecode instructions (and macros, which are collections of instructions) are specified for program logic instead of Java statements and expressions. Jamaica completely uses symbolic names for variables and labels, and Java data type syntax, so the code is still familiar. The following is the syntax for a method body, which also include class initialization blocks: MethodBody ::=Where VariableDecl is the same Java syntax for declaring local variables, and primitive type values can be initialized. Variables must be declared before can be used. Although they can be declared anywhere in the code, they are all of method-wide access. This is different from Java and the JVM specification. There are no sub-scopes within a method body. Variables are also strongly typed, and this type information is used by many macros. In JVM, method parameters are actually local variables, therefore they are accessed exactly like variables. Don't declare variables with the same name as any of the parameters. Exception and Finally Handlers
Before the end of method body, catch clauses can be specified to handle exceptions. CatchClause ::=
The first label (inclusive) and second label (exclusive) designate the catch block, i.e., the
specified exception happening in this range of code will be caught and control is transferred
to the third label. If the exception class name is not specified, this clause catches any
kind of
JVM has no explicit support for Java's Bytecode Programming
As demonstrated in the examples earlier, bytecode instructions use mnemonics, symbolic name and Java style data types. The macros also take advantage of the strong typing of variables and data members, thus making them easier to use than instructions which usually require data types.
For non-static methods, keyword aload this aload_0 The easy way of programming in Jamaica is to cheat. Suppose you want to implement something like this Java method: Compile the Java class first, then usepublic int max(int[] vals) { try { int max = vals[0]; for (int i=1; i javap -c tool to deassemble the code and
get this:
You can mechanically convert this into Jamaica. In this program, variable #0 isMethod int max(int[]) 0 aload_1 1 iconst_0 2 iaload 3 istore_2 4 iconst_1 5 istore_3 6 goto 23 9 aload_1 10 iload_3 11 iaload 12 iload_2 13 if_icmple 20 16 aload_1 17 iload_3 18 iaload 19 istore_2 20 iinc 3 1 23 iload_3 24 aload_1 25 arraylength 26 if_icmplt 9 29 iload_2 30 ireturn 31 astore_2 32 aload_2 33 invokevirtual #3 this ,
#1 is vals the parameter, #2 is a local variable max , and #3 is
another local variable i . We keep the line numbers for reference.
When you become better at Jamaica, especially its handy macros, JVM assembly programming can be a lot easier and fun as well.public int max(int[] vals) { int max, i; 0 begin: aload vals 1 iconst_0 2 iaload 3 istore max 4 iconst_1 5 istore i 6 goto check 9 loop: aload vals 10 iload i 11 iaload 12 iload max 13 if_icmple cont 16 aload vals 17 iload i 18 iaload 19 istore max 20 cont: iinc i 1 23 check: iload i 24 aload vals 25 arraylength 26 if_icmplt loop 29 iload max 30 ireturn 31 //astore_2 // javac re-uses slot #2 for the Exception object 32 //aload_2 // we simply call its method so no need for such 33 action: invokevirtual Exception.printStackTrace()void 36 //goto 39 // obviously redundant 39 iconst_0 40 ireturn catch (begin, action) action } This is actually very close to Java code. You may feel this is going away from the low-level bytecode programming. Well, you can always choose to use bytecode instructions directly. Problem with that is, many commonly used patterns are repeated again and again, each taking many instructions and readabilty becomes really poor. What is nice about Jamaica macros is, the underlying Java class creator,public int max(int[] vals) { int max, i; begin: %set max = vals[0] %array_iterate vals i %if vals[i] > max %set max = vals[i] %end_if %end_iterate %load max ireturn action: invokevirtual Exception.printStackTrace()void iconst_0 ireturn catch (begin, action) action } JavaClassCreator , supports all these macros, so this code can be
faithfully converted to JavaClassCreator calls.
5. Introduction to Instructions
Introduction to the JVM Runtime
When a method is called, a new frame is allocate to store state information during the
method execution; it is discarded when the method returns. Frames are maintained on a stack
of the current thread. Each thread has its own stack. Within the frame, there are numerous
pieces of information, such as local variables and the operand stack. JVM is a stack-based
machine; instructions receive values and return results on the operand stack, as well as
passing parameters to method calls. The local variables in a frame include the current
object reference Both operand stack and local variables are one word (32-bits) wide. Most values are one word, except for long and double values which are two words (64-bits). Constant Loading Instructions
JVM has instructions to load constants onto the top of the stack. Constants can be of type
integer, long, float, long, string and
Instruction
Instructions
JVM also has single-byte instructions for commonly-used constant values. For integer, they
are:
For other values and strings, ldc 129832 // integer ldc (long)232 // long and becomes ldc2_w ldc 5.5 // double and becomes ldc2_w ldc (float)5.5 // float ldc "ABCD" ldc "ABCD" // only one entry for "ABCD" in the constant pool ldc 1234 // Jamaica optimizes this to "sipush 1234" ldc 234 // Jamaica optimizes this to "bipush 234" ldc 2 // Jamaica optimizes this to "iconst_2"
There is no boolean type in JVM. Use 1 for Using Symbolic Constants
Jamaica supports symbolic constants. Anywhere a constant value is expected, a constant name or a class's static-final data member can be used with this syntax:
All programming macros take constants, so do these instructions:
The constant is a simple name, it is one of these: a static-final primitive type data member
already defined in the current class or its parent class and/or implemented interfaces, or a
constant name explicitly defined via the %const clob = java.sql.Types.CLOB public class CTest { public static void main(String[] args) { %ldc {clob} // becomes sipush 2005 %ldc {java.sql.Types.CLOB} // becomes sipush 2005 pop pop } } Variable Access Instructions
Variables in a JVM method are allocated slots in the runtime frame. Most slots take two
bytes except for long and double values, which take four bytes. In JVM, few instructions
deal with variables directly (the only exception is ( <type>where <type> is one of the following: i , l , f ,
d and a , for integer, long, float, double and any, respectively.
So to copy a long value stored in variable foo into bar, you do this:
lload foo lstore bar JVM has single-byte shorthand instructions to access the first 4 variables; they are: <type>These instructions are supported by Jamaica; however, they demand extra caution. Let's take a look at an example: At first glance, this code looks innocent. But line 4 is wrong, because the second variable,1 public void amethod(String msg) { 2 long lvar; 3 int ivar; 4 iload_3 5 i2l 6 lstore_2 7 } lvar , is a long and takes two slots, so the slot number for
ivar , is 4.
One instruction is frequently used: are exactly the same.aload_0 aload this Array Operation Instructions
Arrays In JVM are treated like objects. Array elements can have all Java types. In addition to the simple type counterpart, array elements can also be boolean, byte, char and short.
To access their attributes and data elemenets, JVM has dedicated instructions. All these
instructions need to have the array instance itself loaded on top of the stack. Instruction
<type>where <type> is one of the following: i , l , f ,
d , a , b , c and s for integer, long, float,
double, any, boolean/byte, char and short, respectively. So for a double array
darr , to copy element at 0 to 4, do this:
aload darr // load the array instance dup // it will be used twice here iconst_0 // array index 0 daload // load the double value on the stack dstore tmp // save it bipush 4 // array index 4 dload tmp // get the other value dastore // put the value into the array (at 4) Data Member Access Instructions
A JVM class can have class-wide (static) and instance-wide (non-static) data members. The following instructions are used to access data members: (In Jamaica, if the class name for the field is missing, it is assumed the field is in the current class. The class name and type seem redundant but that is one way JVM enforces data security. For non-static data members, the object that owns the field is loaded on the stack first. Here is an example: class MyClass { PrintStream out; MyClass() { getstatic System.out PrintStream putfield out PrintStream } } Data Type Instructions
These instructions converts the value on the stack top to a different numeric type:
Instruction ( Object Creation Instructions
The instruction new StringBuffer dup bipush 100 invokespecial StringBuffer To create a single dimensional array, use one of these instructions: They both takes the array dimension from the stack top. Here is an example: // to create int[9] bipush 9 newarray int // to create String[10] bipush 10 anewarray String Multi-dimensional arrays are created with this instruction: where dimensions is the dimension of the sub-array to be created. The sizes of each dimension must be placed on the stack first. Here is an example: // to do: // byte[][][] a = new byte[19][19][]; // a[1][2] = new byte[3]; bipush 19 bipush 19 multianewarray byte[][][] 2 dup astore a iconst_1 aaload iconst_2 iconst_3 newarray byte aastore Arithmetic and Logical Instructions
These instructions do arithmethic calculations on the parameters from the stack and store the result on the stack: <type>where <type> is one of the following: i , l , f and
d for integer, long, float and double.
These instructions do logical and shifting operations on the parameters from the stack and store the resutl on the stack: <type>where <type> is one of the following: i and l for integer
and long.
Instruction
where increment is an integer constant.
Stack Manipulation Instructions
JVM has a number of instructions to manipulate the stack top. The reason may be that JVM
has no registers whatsoever, and these instructions may help speed up certain operations.
Whatever the reason, here they are: Method Invocation Instructions
JVM has four method invocation methods: ( Uncontidional Jump Instructions
The program execution can be unconditionally changed to a location that may not be the next
instruction in the flow by an absolute UncondidionalJump ::=where <type> is one of the following: i , l , f ,
d and a for integer, long, float, double and any.
The As usual, the wide version of those instructions are optional; their "narrow" counterparts can be used in place of them and will be converted to wide if necessary.
The Contidional Jump Instructions
The following instructions compare two integers and jump accordingly:
where <op> is one of the following: eq for equal, ne for
not-equal, lt for less-than, le for less-or-equal, gt for
greater-than and ge for greater-or-equal.
For two objects, their equality or non-equality can be compared with these instructions: (
To compare two long, float or double values, you need to first invoke one of these
instructions:
where <op> is one of the following: eq , ne , lt ,
le , gt and ge .
To test whether an object reference is null or not, use these instructions: ( JVM also defines two switch instructions that do multi-way branching. ( ( lookupswitch is a high-performance switch statement: the multiple choices must be
consecutive numbers, so it just needs a first value and a number of labels.
tableswitch (and its Jamaica synonyn, switch ) takes a number of integer
constants and their associated labels. Jamaica optimizes this if the constant values happen
to be consecutive.
Other Instructions
The
There are two synchronization instructions,
6. Introduction to Executable Macros
Jamaica executable macros (or simply, macros) greatly simplifies JVM bytecode assembly programming. They cover these areas:
%set macro.
Executable Macro Parameters
Macro parameters can be a constant, a simple name or an array element expression. No other expressions are available (for now). Syntactically, Param ::= Constant | name (The names in the parameters are resolved in this order: if variable is found with that name, use that variable; otherwise, if a data member (static or otherwise), use that field. This is an example:
%print, %println and %flush
The syntax for the print macro is: (The TargetName is either out (for System.out ) or err
(for System.err .) By default, it is out . These macros can take a
variable number of parameters. For println , the whole list is printed without line
breaks except for the end. For flush , the whole list is printed like with
print , followed by a call to the flush() method.
%load
The syntax for the load macro is:
The value, whether a constant, a variable, a field or an array element, is loaded onto the
top of the stack.
%set
The syntax for the set macro is: The righthand-side value, whether a constant, a variable, a field or an array element, is assigned to the variable, field or an array element of the lefthand-side. %object
The syntax for the object creation macro is: This creates an object of that class on the stack top and invokes its constructor. %array
There are two ways to create an array and put onto the stack, one by specifying the dimensions, the other by initialization values for single-dimensional arrays.
%concat
This macro concatenates all the parameters into a single string and put onto the stack:
%if, %else and %end_if
The if-else structure is familiar to any programmers. Jamaica supports all the comparision operations and handles types automatically. The syntax for the if-else macro is: If the comparison expression is a single parameter, it is treated as a boolean and is compared to > 0. %iterate and %end_iterate
The syntax of the iterate macro is: where Param must be evaluated to either a java.util.Iterator or a
java.util.Enumeration . During each iteration, if the iterate variable is
specified, that element is stored there; otherwise, it is put on the top of the stack.
The types of the elements and the iterate variable must be compatible. E.g.,
public String toCSV(List list) { Iterator iter; %load list invokevirtual List.iterator()Iterator astore iter StringBuffer sb; %object StringBuffer dup astore sb dup boolean first; %set first = true %iterate iter %if first %set first = false %else %load sb ldc ',' invokevirtual StringBuffer.append(char)void %end_if // stack top are: sb and element invokevirtual StringBuffer.append(Object)void dup // sb %end_iterate pop invokevirtual StringBuffer.toString()String areturn } %array_iterate and %end_iterate
The syntax of the array iterate macro is: where Param must be evaluated to an array, and the index variable must be an int . In the iterations, the index variable is incremented from 0 to the array
length minus one. E.g.,
int[] arr; %set arr = %array int[]{ 9, 8, 7 } int idx; %array_iterate arr idx %println "arr[", idx, "]=", arr[idx] %end_iterate
7. Summary
Jamaica is a macro assembly language for the Java VM. It uses the Java syntax for the class or interface definition except for the method bodies, where JVM bytecode instructions are used. Within the method body, variables can be defined and exception handlers can be specified. The instructions all use symbolic names for variables, fields and labels and never use indices that the JVM instruction set has defined. The details of a class file such as constant pools are totally hidden. This is because JVM specification has defined such a rigid JVM structure that programmers have no liberty nor interest to handle these by themselves. In addition, Jamaica supports a number of macros for common patterns that are intelligently expanded into sets of instructions, hence the name Jamaica for the JVM Macro Assembler. This is possible because Jamaica is a strongly-typed language, e.g., each named variable is specified with a type. Jamaica does not support creating inner classes or interfaces, but it can use inner classes or interfaces.
Dynamically creating Java classes at bytecode level is extremely tedious and error-prone.
Jamaica removes most of the chores of managing class files details and greatly simplifies
this task. It is implemented by a Java API,
8. Code Listings
|