JudoScript.COM Design principles of Judo the sport and the language
HomeJudo LanguageJuSP PlatformJamaica Language 
Judo ReferenceJuSP ReferenceWiki/WeblogTutorials/PresentationsDownloadsGoodiesFeedback  
Article: Jamaica: The Java Virtual Machine (JVM) Macro Assembler
 









Table Of Content

  1. Introduction
    » JVM, Bytecode Assembly Programming
    » Introduction to Jamaica
    » Use of Jamaica
    » JVM Specification Conformance
    » Jamaica Assembler and Tools
    » Download
  2. Lexical Rules of a Jamaica Source File
  3. Defining Classes or Interfaces
    » Class-Level Macros
  4. Method Body
    » Exception and Finally Handlers
    » Bytecode Programming
  5. Introduction to Instructions
    » Introduction to the JVM Runtime
    » Constant Loading Instructions
    » Using Symbolic Constants
    » Variable Access Instructions
    » Array Operation Instructions
    » Data Member Access Instructions
    » Data Type Instructions
    » Object Creation Instructions
    » Arithmetic and Logical Instructions
    » Stack Manipulation Instructions
    » Method Invocation Instructions
    » Uncontidional Jump Instructions
    » Contidional Jump Instructions
    » Other Instructions
  6. Introduction to Executable Macros
    » Executable Macro Parameters
    » %print, %println and %flush
    » %load
    » %set
    » %object
    » %array
    » %concat
    » %if, %else and %end_if
    » %iterate and %end_iterate
    » %array_iterate and %end_iterate
  7. Summary
  8. Code Listings

Jamaica: The Java Virtual Machine (JVM) Macro Assembler

By James Jianbo Huang    March 2004       printer-friendly version

Abstract   Jamaica, the JVM Macro Assembler, is an easy-to-learn and easy-to-use assembly language for JVM bytecode programming. It uses Java syntax to define a JVM class except for the method body that takes bytecode instructions, including Jamaica's built-in macros. In Jamaica, bytecode instructions use mnemonics and symbolic names for all variables, parameters, data fields, constants and labels. Jamaica is a simplified JVM assembly language. It does not support inner classes. Variables are all method-wide and are strongly-typed.

Jamaica is a language facade for a Java class creation API, JavaClassCreator. This API closely mimics the Jamaica language, allows users to define a Java class with the same flow, and supports all the Jamaica instruction set and macros.

Why Jamaica? Even with the rigid JVM architecture and verification, creating JVM classes at bytecode level is still highly risky and error-prone. With Jamaica, you can quickly experiment dynamically creating classes; once done, mechanically convert the Jamaica source code into JavaClassCreator API calls. Jamaica is currently the only macro assembler for JVM, and serves this purpose very well. This is Jamaica's users' manual, including syntax for all JVM bytecode instructions and Jamaica macros. This is not meant to be a reference for the JVM, Java class files, JVM architecture and JVM runtime environment; they are introduced where necessary only to serve as the background for bytecode programming.


 

1. Introduction

Jamaica, or JVM Macro Assembler, is an assembly language for JVM bytecode programming. It uses syntax that is almost identical to Java language except for the method body, where the program is written in bytecode instructions. Class fields, methods and local variables are all declared with Java syntax. Symbolic labels are used instead of absolute addresses, and variable and field names are used in the instructions rather than their indices. This makes it very easy to do JVM assembly programming.

JVM, Bytecode Assembly Programming

The Java Virtual Machine (JVM for short) is a specification defined by Sun Microsystems. It is by no means a simple reflection of the Java programming language. It only understands a particular binary format, the class file format, which contains a symbol table (constant pool), data fields, methods with JVM instructions (so-called bytecodes), and other ancillary information. The Java virtual machine imposes strong format and structural constraints of a class file for the sake of security.

Even though JVM is totally separate from the Java language, it is certainly designed with Java in mind. It has facilities to support all Java features such as various kinds of classes (including interface), synchronization instructions and Java data type support. Since class is the center of JVM, and the class structure is so well defined and closely reflects the needs of the Java programming language, this "CPU" is quite different from traditional ones. Even for JVM assembler programmers, there is not much one can do to control the structure, hence understanding the class file format doesn't really help much. It is quite easy to hide the complexity of enforcing the structural constraints through Java-like syntax and let the assembler to do the dirty work. The JVM Bytecode instructions ubiquitously use indices to reference fields, variables and constant pool entries. Jamaica exclusively uses symbolic names for all instructions. Therefore, programmers can focus on programming with JVM instructions, fields and variables. This is the design goal of Jamaica.

Introduction to Jamaica

The Jamaica language is a language to specify JVM bytecode instructions in a Java-like class structure that will be compiled into a JVM class. Its syntax is mostly the same as the Java programming language, that is, the class, initialization, member declarations and variable declarations are all the same as in Java. The package and import are also supported. The executable code is written in JVM bytecode instructions; the format of instructions uses mnemonics and symbolic names for labels, fields and variables. No indices are used and allowed. Jamaica is a strongly typed language (stronger than JVM, because in JVM variables are just slots and type is less strongly enforced.)

Assembly language programs typically have a lot of patterns that are used repeatedly. Jamaica has defined a number of useful macros; hence "macro" in its name. Because it is strongly typed, uses of those macros greatly simplifies programmers' lives because many JVM instructions actually force to specify the type of its operands. The current version is a little shy of being a true "macro assembly language" as it does not support user-defined macros.

Let's take a look at an example.

Listing 1. CFirstCls1.ja
public class CFirstCls
{
  int count;

  public CFirstCls() {
    iconst_0
    putfield count int
    return
  }

  public void inc(int amount) {
    getfield count int
    iload amount
    iadd
    putfield count int
    return
  }

  public void printSelf() throws IOException {
    getstatic System.out PrintStream
    aload this    // same as aload_0
    invokevirtual PrintStream.println(Object)void
    return
  }

  public String toString() {
    ldc "It's only me!"
    areturn
  }
}

The code is self-explanatory, assuming you are somewhat familiar with the JVM bytecode instructions as well as Java class format in general. Outside of Java methods, it is just in Java syntax. Note that Java class names and types are used like in Java. Java built-in classes in packages java.lang., java.io. and java.util. can be used without package prefix or explicit import declarations. Next is a version for the same class but uses macros.

Listing 2. CFirstCls2.ja
public class CFirstCls
{
  int count;

  public CFirstCls() {
    %set count = 0
  }

  public void inc(int amount) {
    %load count
    %load amount
    iadd
    putfield count int
  }

  public void printSelf() throws IOException {
    %println this
  }

  public String toString() {
    %concat "CFirstCls<", count, '>'
    areturn
  }
}

Macros are cool. Look at toString(), where the concatenated string is put on top of the stack for consumption. With bytecode instructions, this program will be much longer. Besides the uses of macros, you may have noticed one thing: the return statements are missing for methods of void return type. Jamaica inserts one automatically if it does not see one.

Use of Jamaica

What's the use of Jamaica? One of the main purposes is to study the JVM in order to create Java classes on the fly without compilation. As you will discover (if you haven't done that yet), creating Java classes directly with bytecode instructions is very error-prone and takes a lot of effort. Jamaica is a relatively high-level language for this purpose and lets you focus on the class you want to create rather than the nitty-gritty details of class file structure, so you can quickly experiment with your class creation. Jamaica is a language facade for a Java class creation API, JavaClassCreator. This class is modeled after Jamaica; it uses symbolic labels and field/variable names, supports all Jamaica macros, and the flow of creating a class is identical to specifying a class in Jamaica. Currently there are a couple of implementations that use the ASM package and the Jakarta-Apache BCEL package. Click on the link to read more about using this class.

JVM Specification Conformance

Jamaica is not related to Java nor is obliged to support all the features of the JVM specification. It does not support inner classes. It does not allow reuse of variables with different types nor ranges. This means all variable slots are strongly typed, something that is not rigidly enforced by the JVM. All these are quite minor, and frankly, it is probably better off without those features after all. (But, opinions differ.)

JVM bytecode instructions use index numbers such as constant pool entry numbers, field numbers and local operand stack slot numbers. Because Jamaica is a symbolic assembler, you can't specify instructions in their "raw" format, nor can you manipulate the constant pool entries. This is exactly the reason for this symbolic assembler.

Jamaica Assembler and Tools

The Jamaica assembler is the java class com.judoscript.jamaica.Main. It takes a Jamaica source file (by convention with extension ".ja") and generates a class file. The class name is specified in the source and may be different from the file name, although it is highly recommended to keep them the same.

The most popular and convenient tool to inspect the generated class file is javap that comes with JDK installation. Make sure the generated class is in the classpath, then run javap with the -c option to show bytecode. It display the bytecode in its own format that is different from Jamaica's, but it is visually similar.

Jamaica, being an assembler, does little code verification and optimization, because it uses textual information for class creation as much as possible to avoid dependency of other Java classes. Therefore, you can generate invalid code! You should always test it, one way is to run the generated class even if it does not contain the start-up method main().

There are other tools for verifying and inspecting Java classes, such as utilities included in the Jakarta-BCEL package.

 

»»» Top «««

 

2. Lexical Rules of a Jamaica Source File

To avoid confusion, from this point on, we use "Java" for the Java programming language. The following are the lexical rules for Jamaica.

  1. Comment is the same as Java single-line and multi-line comment.
  2. Jamaica identifiers are Java identifiers. All Java reserved words are Jamaica reserved words; Jamaica has no extra reserved words at the level of class/interface declaration.
  3. Within method bodies and class static initization blocks, bytecode instruction mnemonics are considered reserved words, and should not be used as names for variables, parameters and labels. Refer to the instruction sets for all the mnemonics.
  4. All data type names, including Java primitive types, class and interface names and array names, are the same as Java data type names. No JVM style data type names are used.
  5. All macro names start with %.
  6. Bytecode instructions and macros are not terminated by any terminator character. They are not required to be on a single line, although this is highly recommended for readability.

 

»»» Top «««

 

3. Defining Classes or Interfaces

These are the syntactic and semantic rules for defining a JVM class or interface.

  1. The class or interface name is a simple name without package prefix.
  2. The package prefix, if present, must be specified at first with the same Java syntax.
  3. Following the optional package prefix declaration and before the class or interface declaration, there may be zero or more import declarations with the same Java syntax.
  4. Where a class name is expected, the class name is resolved via the following pseudo code:
      if the class name has a package prefix, i.e., contains dots, then
        use it as-is;
      otherwise, i.e., it is a simple one without package prefix, then
        if there is an exact match in the import list (see below) then
          use the first (or only) match
        else
          if there is a match in the import list (see below) then
            use the first (or only) match
          else
            use the simply class name as-is
          end if
        end if
      end if
    
    Notice that the resolved class name may or may not represent a valid Java class at compile time. The rules for matching a name against an import list are:
    1. An exact match is found when a complete class name is specified in the import (e.g. java.sql.Date), and the class name is same as the name after the last dot.
    2. A non-exact match is found when an import declaration ends with an asterisk and the class name can be resolved into a class in that package at compile-time.
    3. These packages are auto-imported: java.lang.*, java.io.*, and java.util.*. Therefore, classes and interfaces in these packages can be used directly without package prefixes.
    Inner class names use dollar sings ($) to separate the inner-outer class names.
  5. Class data members are defined with Java syntax, but they can not be assigned initial values. Initial values for local members are assigned in constructor(s), and static members in the class initialization block(s).
  6. The only exception to the above rule is for static final data members of primitive types, whose values must be assigned, with the same Java syntax. (Static final non-primitive-type members are still assigned in the initialization block(s).)
  7. Methods are declared with the same Java syntax except for the content of the method bodies. If a method delcaration ends with a ;, it is assumed abstract and the abstract attribute is optional. This is true for both interface and class methods.
  8. There can be zero or more class initialization blocks with the same Java syntax.
  9. The contents of method bodies and initialization blocks can contain variable declarations, bytecode instructions, labels and exception tables. This is described in greater detail below.
  10. There can be class-level macros to simplify your life when appropriated. See the next section.
  11. For constant values, there are special uses that are described in detail below.

It looks like lot of rules, but they are all intuitive if you know Java (who doesn't?) The following is an example.

Listing 3. CSecondCls.ja
package xyz;

public class CSecondCls implements Serializable
{
  public static final int MAX = 5;  // static/final/primitive: must initialize.
  public static final HashMap symbols;

  static  long      lSFld;
  static  ArrayList oSFld;

  private int       iFld;
  private String[]  saFld;

  static {
    %set symbols = %object HashMap // static final
    %set lSFld = 0
    %set oSFld = null
  }

  %default_constructor <public>

  public static long getLong()       { getstatic lSFld long  lreturn }
  public static void setLong(long v) { lload v  putstatic lSFld long }

  public static List getList()            { getstatic oSFld ArrayList  areturn }
  public static void setList(ArrayList v) { aload v  putstatic oSFld ArrayList }

  public int  getInt()      { aload_0  getfield iFld int  ireturn }
  public void setInt(int v) { aload_0  iload v  putfield iFld int }

  public String[] getSA()       { aload_0  getfield saFld String[]  areturn }
  public void setSA(String[] v) { aload_0  aload v  putfield saFld String[] }

  public String toString() {
    String parentString;
    long lV;
    List listV;
    int  iV;
    String[] saV;

    aload this
    invokespecial Object.toString()String
    astore parentString

    invokestatic getLong()long
    lstore lV

    invokestatic getList()List
    astore listV

    aload_0     // load this object
    invokevirtual getInt()int
    istore iV

    aload this  // same as aload_0
    invokevirtual getSA()String[]
    astore saV

    // now, format the string
    %concat parentString, "\ngetLong() = ", lV, "\ngetList() = ", listV,
            "\ngetInt()  = ", iV, "\ngetSA()   = ", saV, "\n---------------"
    areturn     // the string is on the stack top
  }

  // Test it out.
  public static void main(String[] args) {
    CSecondCls obj;
    %set obj = %object CSecondCls

    %println obj

    // Call their methods and print again.

    %load obj
    ldc 4
    invokevirtual setInt(int)void

    %load obj
    %array String[] { "ABCD", "EFG", "HIJK" }
    invokevirtual setSA(String[])void

    ldc (long)100
    invokestatic setLong(long)void

    %object ArrayList
    invokestatic setList(ArrayList)void

    %println obj
  }
}

Run it with the following command line, it generates file CSecondCls.class; move it to the right place in the classpath, and run it with the following result:

% java com.judoscript.jamaica.Main CSecondCls.ja
% mv CSecondCls.class xyz/
% java xyz.CSecondCls
CSecondCls@3f5d07
getLong() = 0
getList() = null
getInt()  = 0
getSA()   = null
---------------
CSecondCls@3f5d07
getLong() = 100
getList() = []
getInt()  = 4
getSA()   = [Ljava.lang.String;@cac268
---------------

Class-Level Macros

Jamaica supports these class-level macros:

  • If the parent class has a default constructor, and there is no specific object initialization, then this macro can be used to define a default constructor:
    %default_constructor [ < ( public | protected | private ) > ]

 

»»» Top «««

 

4. Method Body

In essence, Jamaica the language is almost identical to Java except for the content of the class method body. In Jamaica, bytecode instructions (and macros, which are collections of instructions) are specified for program logic instead of Java statements and expressions. Jamaica completely uses symbolic names for variables and labels, and Java data type syntax, so the code is still familiar. The following is the syntax for a method body, which also include class initialization blocks:

MethodBody ::= { ( VariableDecl | [ Label> : ] Instruction )* ( CatchClause )* }
Where VariableDecl is the same Java syntax for declaring local variables, and primitive type values can be initialized.

Variables must be declared before can be used. Although they can be declared anywhere in the code, they are all of method-wide access. This is different from Java and the JVM specification. There are no sub-scopes within a method body. Variables are also strongly typed, and this type information is used by many macros.

In JVM, method parameters are actually local variables, therefore they are accessed exactly like variables. Don't declare variables with the same name as any of the parameters.

Exception and Finally Handlers

Before the end of method body, catch clauses can be specified to handle exceptions.

CatchClause ::=
catch [ ClassName ] ( Label , Label ) Label

The first label (inclusive) and second label (exclusive) designate the catch block, i.e., the specified exception happening in this range of code will be caught and control is transferred to the third label. If the exception class name is not specified, this clause catches any kind of java.lang.Throwables.

JVM has no explicit support for Java's finally clause. When a finally clause is specified for a block, the Java compiler make sure all branches invoke that handler before exiting the method. That is, finally clause is a Java construct, not JVM's. In Jamaica, this becomes a style issue and you can choose to do anything.

Bytecode Programming

As demonstrated in the examples earlier, bytecode instructions use mnemonics, symbolic name and Java style data types. The macros also take advantage of the strong typing of variables and data members, thus making them easier to use than instructions which usually require data types.

For non-static methods, keyword this is used to denote the current object. In JVM method calls, this is always the first parameter (index 0), so these two statements are equivalent:

  aload this
  aload_0

The easy way of programming in Jamaica is to cheat. Suppose you want to implement something like this Java method:

  public int max(int[] vals) {
    try {
      int max = vals[0];
      for (int i=1; i max)
          max = vals[i];
      return max;
    } catch(Exception e) {
      e.printStackTrace();
    }
    return 0;
  }
Compile the Java class first, then use javap -c tool to deassemble the code and get this:
Method int max(int[])
   0 aload_1
   1 iconst_0
   2 iaload
   3 istore_2
   4 iconst_1
   5 istore_3
   6 goto 23
   9 aload_1
  10 iload_3
  11 iaload
  12 iload_2
  13 if_icmple 20
  16 aload_1
  17 iload_3
  18 iaload
  19 istore_2
  20 iinc 3 1
  23 iload_3
  24 aload_1
  25 arraylength
  26 if_icmplt 9
  29 iload_2
  30 ireturn
  31 astore_2
  32 aload_2
  33 invokevirtual #3 
  36 goto 39
  39 iconst_0
  40 ireturn
Exception table:
   from   to  target type
     0    30    31   
You can mechanically convert this into Jamaica. In this program, variable #0 is this, #1 is vals the parameter, #2 is a local variable max, and #3 is another local variable i. We keep the line numbers for reference.
  public int max(int[] vals) {
            int max, i;

 0   begin: aload vals
 1          iconst_0
 2          iaload
 3          istore    max
 4          iconst_1
 5          istore    i
 6          goto      check
 9    loop: aload     vals
10          iload     i
11          iaload
12          iload     max
13          if_icmple cont
16          aload     vals
17          iload     i
18          iaload
19          istore    max
20    cont: iinc      i 1
23   check: iload     i
24          aload     vals
25          arraylength
26          if_icmplt loop
29          iload     max
30          ireturn
31          //astore_2  // javac re-uses slot #2 for the Exception object
32          //aload_2   // we simply call its method so no need for such
33  action: invokevirtual Exception.printStackTrace()void
36          //goto 39   // obviously redundant
39          iconst_0
40          ireturn

    catch (begin, action) action
  }
When you become better at Jamaica, especially its handy macros, JVM assembly programming can be a lot easier and fun as well.
  public int max(int[] vals) {
    int max, i;
begin:
    %set max = vals[0]
    %array_iterate vals i
      %if vals[i] > max
        %set max = vals[i]
      %end_if
    %end_iterate
    %load max
    ireturn
action:
    invokevirtual Exception.printStackTrace()void
    iconst_0
    ireturn

    catch (begin, action) action
  }
This is actually very close to Java code. You may feel this is going away from the low-level bytecode programming. Well, you can always choose to use bytecode instructions directly. Problem with that is, many commonly used patterns are repeated again and again, each taking many instructions and readabilty becomes really poor. What is nice about Jamaica macros is, the underlying Java class creator, JavaClassCreator, supports all these macros, so this code can be faithfully converted to JavaClassCreator calls.

 

»»» Top «««

 

5. Introduction to Instructions

Introduction to the JVM Runtime

When a method is called, a new frame is allocate to store state information during the method execution; it is discarded when the method returns. Frames are maintained on a stack of the current thread. Each thread has its own stack. Within the frame, there are numerous pieces of information, such as local variables and the operand stack. JVM is a stack-based machine; instructions receive values and return results on the operand stack, as well as passing parameters to method calls. The local variables in a frame include the current object reference this as its first one (for non-static methods), followed by invocation parameters and the method local variables.

Both operand stack and local variables are one word (32-bits) wide. Most values are one word, except for long and double values which are two words (64-bits).

Constant Loading Instructions

JVM has instructions to load constants onto the top of the stack. Constants can be of type integer, long, float, long, string and null.

Instruction aconst_null loads null.

Instructions bipush and sipush push small integers values; bipush takes a byte parameter, and sipush takes a short (double-byte) parameter.

JVM also has single-byte instructions for commonly-used constant values. For integer, they are: iconst_m1 (for minus-1) and iconst_0 through iconst_5; for long, lconst_0 and lconst_1; for float, fconst_0 through fconst_2; and for double, dconst_0 and dconst_1.

For other values and strings, ldc and its variants, ldc_w and ldc2_w, load constants from the class's constant pool. In JVM, these instructions take as a parameter an index number that points to a constant pool entry: ldc takes a byte index, while the "wide" versions take a double-type index. ldc2_w is for loading long and double constants. In Jamaica, no index numbers are used. These instructions just take a constant literal as its parameter. Jamaica also handles the wideness and value size with the ldc, that is, you can always specify ldc regardless of the size of the index number or the size of value. Here are a few examples:

ldc 129832      // integer
ldc (long)232   // long and becomes ldc2_w
ldc 5.5         // double and becomes ldc2_w
ldc (float)5.5  // float
ldc "ABCD"
ldc "ABCD"      // only one entry for "ABCD" in the constant pool
ldc 1234        // Jamaica optimizes this to "sipush 1234"
ldc 234         // Jamaica optimizes this to "bipush 234"
ldc 2           // Jamaica optimizes this to "iconst_2"

There is no boolean type in JVM. Use 1 for true and 0 for false.

Using Symbolic Constants

Jamaica supports symbolic constants. Anywhere a constant value is expected, a constant name or a class's static-final data member can be used with this syntax:

{ [ ClassName . ] name}

All programming macros take constants, so do these instructions: iinc, ldc (and ldc_w and ldc2_w), bipush, sipush, switch (including tableswitch and lookupswitch).

The constant is a simple name, it is one of these: a static-final primitive type data member already defined in the current class or its parent class and/or implemented interfaces, or a constant name explicitly defined via the %const macro prior to the class/interface declaration. The constant value is obtained at compile time. Here is an example:

%const clob = java.sql.Types.CLOB

public class CTest
{
  public static void main(String[] args) {
    %ldc {clob}                // becomes  sipush 2005
    %ldc {java.sql.Types.CLOB} // becomes  sipush 2005
    pop
    pop
  }
}

Variable Access Instructions

Variables in a JVM method are allocated slots in the runtime frame. Most slots take two bytes except for long and double values, which take four bytes. In JVM, few instructions deal with variables directly (the only exception is iinc); values of variables need be load onto or store from the top of the stack, via JVM's load and store instructions. In Jamaica, variables, including method parameters, are represented by symbolic names; nevertheless, within JVM, they are represented by slot numbers. Their syntax is:

( <type>load | <type>store ) variable
where <type> is one of the following: i, l, f, d and a, for integer, long, float, double and any, respectively. So to copy a long value stored in variable foo into bar, you do this:
lload  foo
lstore bar

JVM has single-byte shorthand instructions to access the first 4 variables; they are:

<type>load_<0-3> | <type>store_<0-3>
These instructions are supported by Jamaica; however, they demand extra caution. Let's take a look at an example:
1  public void amethod(String msg) {
2    long lvar;
3    int  ivar;
4    iload_3
5    i2l
6    lstore_2
7  }
At first glance, this code looks innocent. But line 4 is wrong, because the second variable, lvar, is a long and takes two slots, so the slot number for ivar, is 4.

One instruction is frequently used: aload_0. In a non-static method, the object instance for this method is pushed as the first variable and always occupies slot #0. In Jamaica, keyword this is used for the same purpose. Hence,

aload_0
aload this
are exactly the same.

Array Operation Instructions

Arrays In JVM are treated like objects. Array elements can have all Java types. In addition to the simple type counterpart, array elements can also be boolean, byte, char and short.

To access their attributes and data elemenets, JVM has dedicated instructions. All these instructions need to have the array instance itself loaded on top of the stack. Instruction arraylength returns the length of the array on the stack top. The syntax for array element access methods is:

<type>aload | <type>astore
where <type> is one of the following: i, l, f, d, a, b, c and s for integer, long, float, double, any, boolean/byte, char and short, respectively. So for a double array darr, to copy element at 0 to 4, do this:
  aload    darr // load the array instance
  dup           // it will be used twice here
  iconst_0      // array index 0
  daload        // load the double value on the stack
  dstore   tmp  // save it
  bipush   4    // array index 4
  dload    tmp  // get the other value
  dastore       // put the value into the array (at 4)

Data Member Access Instructions

A JVM class can have class-wide (static) and instance-wide (non-static) data members. The following instructions are used to access data members:

( getfield | getstatic | putfield | putstatic ) [ ClassName . ] FieldName type
In Jamaica, if the class name for the field is missing, it is assumed the field is in the current class. The class name and type seem redundant but that is one way JVM enforces data security. For non-static data members, the object that owns the field is loaded on the stack first. Here is an example:
class MyClass
{
  PrintStream out;

  MyClass() {
    getstatic System.out PrintStream
    putfield  out        PrintStream
  }
}

Data Type Instructions

These instructions converts the value on the stack top to a different numeric type:

i2l | i2f | i2d | i2b | i2c | i2s |
l2i | l2f | l2d | f2i | f2l | f2d | d2i | d2l | d2f

Instruction checkcast checks the object on the stack top for a particular class and throws an exception if the object is not compatible with the class. Instruction instanceof checks the object against a particular class and returns a boolean value (0 or 1) on the stack. Their syntax is:

( checkcast | instanceof ) ClassName

Object Creation Instructions

The instruction new ClassName creates an instance of that class on the stack. It must be initialized by an explicit call to one of its constructors. The following is an example:

new StringBuffer
dup
bipush 100
invokespecial StringBuffer(int)void
// now the StringBuffer object on the stack is ready

To create a single dimensional array, use one of these instructions:

newarray PrimitiveType | anewarray ClassName
They both takes the array dimension from the stack top. Here is an example:
// to create int[9]
bipush 9
newarray int
// to create String[10]
bipush 10
anewarray String

Multi-dimensional arrays are created with this instruction:

multianewarray DataType ( [] )* dimensions
where dimensions is the dimension of the sub-array to be created. The sizes of each dimension must be placed on the stack first. Here is an example:
// to do:
//   byte[][][] a = new byte[19][19][];
//   a[1][2] = new byte[3];
bipush 19
bipush 19
multianewarray byte[][][] 2
dup
astore a
iconst_1
aaload
iconst_2
iconst_3
newarray byte
aastore

Arithmetic and Logical Instructions

These instructions do arithmethic calculations on the parameters from the stack and store the result on the stack:

<type>add | <type>sub | <type>mul | <type>div | <type>rem | <type>neg
where <type> is one of the following: i, l, f and d for integer, long, float and double.

These instructions do logical and shifting operations on the parameters from the stack and store the resutl on the stack:

<type>shl | <type>shr | <type>ushr | <type>and | <type>or | <type>xor
where <type> is one of the following: i and l for integer and long.

Instruction iinc is the only one that operates on a variable slot, not stack. Its syntax is:

iinc variable increment
where increment is an integer constant.

Stack Manipulation Instructions

JVM has a number of instructions to manipulate the stack top. The reason may be that JVM has no registers whatsoever, and these instructions may help speed up certain operations. Whatever the reason, here they are: pop, pop2, dup, dup_x1, dup_x2, dup2, dup2_x1, dup2_x2 and swap. They are all supported in Jamaica.

Method Invocation Instructions

JVM has four method invocation methods: invokevirtual is used to call methods of object instances; invokestatic is to call static methods; invokeinterface is to call interface methods, and invokespecial is to call special methods such as constructors or methods of the super classes. They all share the same syntax in Jamaica:

( invokevirtual | invokestatic | invokeinterface | invokespecial )
[ ClassName . ] name MethodSignature
MethodSignature ::= ( [ DataType ( , DataType )* ] ) DataType

Uncontidional Jump Instructions

The program execution can be unconditionally changed to a location that may not be the next instruction in the flow by an absolute goto (and goto_w) instruction, a return instruction in a method, or by an exception explicitly thrown via the athrow instruction. The syntax for these instructions are:

UncondidionalJump ::=
( goto | goto_w ) label |
( jsr | jsr_w ) label | ret |
return | <type>return |
athrow
where <type> is one of the following: i, l, f, d and a for integer, long, float, double and any.

The jsr (and jsr_w) and the companion ret make it possible to implement subroutines within methods. The label must point to an address within the method that calls jsr, and at the end of the subroutine there must be a ret. Java language does not explicitly use this JVM feature, although javac commonly use this to implement finally clause if there are multiple exit routes.

As usual, the wide version of those instructions are optional; their "narrow" counterparts can be used in place of them and will be converted to wide if necessary.

The return is not necessary at the end of a method with a void return type. Jamaica inserts one if it is needed.

Contidional Jump Instructions

The following instructions compare two integers and jump accordingly:

if_icmp<op> label
where <op> is one of the following: eq for equal, ne for not-equal, lt for less-than, le for less-or-equal, gt for greater-than and ge for greater-or-equal.

For two objects, their equality or non-equality can be compared with these instructions:

( if_acmpeq | if_acmpne ) label

To compare two long, float or double values, you need to first invoke one of these instructions: lcmp, fcmpl, fcmpg, dcmpl and dcmpg, which compares the two values on the stack and leave an integer result on the stack top. Then, use one of the following to branch:

if<op> label
where <op> is one of the following: eq, ne, lt, le, gt and ge.

To test whether an object reference is null or not, use these instructions:

( ifnull | ifnonnull ) label

JVM also defines two switch instructions that do multi-way branching.

( ( tableswitch | switch ) ( int_constant : label )* default : label
| lookupswitch int_constant ( label )* default : label )
lookupswitch is a high-performance switch statement: the multiple choices must be consecutive numbers, so it just needs a first value and a number of labels. tableswitch (and its Jamaica synonyn, switch) takes a number of integer constants and their associated labels. Jamaica optimizes this if the constant values happen to be consecutive.

Other Instructions

The nop instruction does nothing. It can be used as a placeholder for testing purposes.

There are two synchronization instructions, monitorenter and monitorexit, to implement object-based synchronization. They both takes an object on the stack as their parameter. Method synchronization is denoted by the synchronized attribute, not through these instructions.

 

»»» Top «««

 

6. Introduction to Executable Macros

Jamaica executable macros (or simply, macros) greatly simplifies JVM bytecode assembly programming. They cover these areas:

  1. print
  2. get and set values from/to constants, variables and data fields
  3. object and array creation
  4. string concatenation
  5. conditional branching for comparisons
  6. iteration of iterators and enumerations
Jamaica macros take advantage of the strongly-typedness of variables. They treat individual variables, data fields, array elements and constants consistently. The object and array creation macros are assignable macros, meaning they can be used as the righthand side values for the %set macro.

Executable Macro Parameters

Macro parameters can be a constant, a simple name or an array element expression. No other expressions are available (for now). Syntactically,

Param ::= Constant | name ( [ Param ] )*
The names in the parameters are resolved in this order: if variable is found with that name, use that variable; otherwise, if a data member (static or otherwise), use that field. This is an example:

Listing 4. MacroTest.ja
public class MacroTest
{
  static int iSFld[];
  int idx;

  static {
    %set iSFld = %array int[]{ 2, 3, 4 }
  }

  void foo() {
    %set idx = 0
    %println "iSFld[iSFld[idx=", idx, "]] = ", iSFld[iSFld[idx]]
  }

  public static void main(String[] args) {
    %object MacroTest
    invokevirtual foo()void
  }
}

%print, %println and %flush

The syntax for the print macro is:

( %println | %print | %flush ) [ < TargetName > ] [ Param ( , Param )* ]
The TargetName is either out (for System.out) or err (for System.err.) By default, it is out. These macros can take a variable number of parameters. For println, the whole list is printed without line breaks except for the end. For flush, the whole list is printed like with print, followed by a call to the flush() method.

%load

The syntax for the load macro is:

%load Param
The value, whether a constant, a variable, a field or an array element, is loaded onto the top of the stack.

%set

The syntax for the set macro is:

%set Param = Param
The righthand-side value, whether a constant, a variable, a field or an array element, is assigned to the variable, field or an array element of the lefthand-side.

%object

The syntax for the object creation macro is:

%object ClassName [ ( DataType ( , DataType )* ) ( Param ( , Param )* ) ]
This creates an object of that class on the stack top and invokes its constructor.

%array

There are two ways to create an array and put onto the stack, one by specifying the dimensions, the other by initialization values for single-dimensional arrays.

%array ClassName ( [ Param ] )+ ( [ ] )* | [ ] { Param ( , Param )* } )

%concat

This macro concatenates all the parameters into a single string and put onto the stack:

%concat Param ( , Param )*

%if, %else and %end_if

The if-else structure is familiar to any programmers. Jamaica supports all the comparision operations and handles types automatically. The syntax for the if-else macro is:

%if Param [ CompareOp Param ] CodeList [ %else CodeList ] %end_if
CompareOp ::= == | != | < | <= | > | >=
If the comparison expression is a single parameter, it is treated as a boolean and is compared to > 0.

%iterate and %end_iterate

The syntax of the iterate macro is:

%iterate Param [ IterateVarName ] CodeList %end_iterate
where Param must be evaluated to either a java.util.Iterator or a java.util.Enumeration. During each iteration, if the iterate variable is specified, that element is stored there; otherwise, it is put on the top of the stack. The types of the elements and the iterate variable must be compatible. E.g.,
public String toCSV(List list) {
  Iterator iter;
  %load list
  invokevirtual List.iterator()Iterator
  astore iter

  StringBuffer sb;
  %object StringBuffer
  dup
  astore sb
  dup
  boolean first;
  %set first = true
  %iterate iter
    %if first
      %set first = false
    %else
      %load sb
      ldc ','
      invokevirtual StringBuffer.append(char)void
    %end_if
    // stack top are: sb and element
    invokevirtual StringBuffer.append(Object)void
    dup  // sb
  %end_iterate 
  pop
  invokevirtual StringBuffer.toString()String
  areturn
}

%array_iterate and %end_iterate

The syntax of the array iterate macro is:

%array_iterate Param IterateIndexVarName CodeList %end_iterate
where Param must be evaluated to an array, and the index variable must be an int. In the iterations, the index variable is incremented from 0 to the array length minus one. E.g.,
int[] arr;
%set arr = %array int[]{ 9, 8, 7 }
int idx;
%array_iterate arr idx
  %println "arr[", idx, "]=", arr[idx]
%end_iterate

 

»»» Top «««

 

7. Summary

Jamaica is a macro assembly language for the Java VM. It uses the Java syntax for the class or interface definition except for the method bodies, where JVM bytecode instructions are used. Within the method body, variables can be defined and exception handlers can be specified. The instructions all use symbolic names for variables, fields and labels and never use indices that the JVM instruction set has defined. The details of a class file such as constant pools are totally hidden. This is because JVM specification has defined such a rigid JVM structure that programmers have no liberty nor interest to handle these by themselves. In addition, Jamaica supports a number of macros for common patterns that are intelligently expanded into sets of instructions, hence the name Jamaica for the JVM Macro Assembler. This is possible because Jamaica is a strongly-typed language, e.g., each named variable is specified with a type. Jamaica does not support creating inner classes or interfaces, but it can use inner classes or interfaces.

Dynamically creating Java classes at bytecode level is extremely tedious and error-prone. Jamaica removes most of the chores of managing class files details and greatly simplifies this task. It is implemented by a Java API, JavaClassCreator, which is modeled after the Jamaica language itself, including macros. Therefore, you have the tool, Jamaica, to quickly specify and verify the process to dynamically create Java classes, and once done, you can easily, mechanically convert the Jamaica source code into a series of JavaClassCreator method calls to use in your Java software that dynamically creates Java classes.

 

»»» Top «««

 

8. Code Listings

  1. CFirstCls1.ja
  2. CFirstCls2.ja
  3. CSecondCls.ja
  4. MacroTest.ja




Copyright © 2001-2005 JudoScript.COM. All Rights Reserved.