While JIT compilers have an important place in a Java system, for frequently used applications it is better to use a more traditional "ahead-of-time" or batch compiler. While Java has been primarily touted as an internet/web language, many people are interested in using Java as an alternative to traditional languages such as C++, if the performance can be made adequate. For embedded applications it makes much more sense to pre-compile the Java program, especially if the program is to be in ROM.
So Cygnus is building a Java programming environment that is based on conventional a compiler, linker, and debugger, using Java-enhanced versions of the existing GNU programming tools.
The core tool is of course the compiler. This is "cc1java," a gcc new front-end. This has similar structure as existing front-ends, and shares most of the code with them. The most unusual aspect of cc1java is that its "parser" reads *either* Java source files or Java bytecode files. (The first release will only support directly support bytecodes; parsing Java source will be done by invoking Sun's javac. A future version will provide an integrated Java parser, mainly for the sake of compilation speed.) In any case, it is important that cc1java can read bytecodes, for at three reasons: (1) it is the natural way to get declarations of external classes (in this respect a Java byetcode file is like a C++ pre-compiled header file); (2) it is needed so we can support code produced from other tools that produce Java bytecodes (such as the Kawa Scheme-to-Java-bytecode compiler); and (3) some libraries are (unfortunately) distributed as Java bytecodes without source.
To "parse" a Java bytecode file involves first parsing the meta-data in the file. Each bytecode file defines one Java class, and defines the superclass, fields, and methods of the class. We use this information to build corresponding declarations and type nodes using mostly-standard gcc "tree" nodes. This information will also be used to generate the run-time meta-information (such as the Class data structure): The compiler generates initialized static data that have the same layout as the run-time data structures used by the Java VM. Thus startup is fast, and does not require allocating any data.
The executable content of a bytecode file contains a vector of bytecode instructions for each (non-native) method. Code generation means converting the stack-oriented bytecodes into gcc expression nodes. The first problem is that we must know for each instruction the types of each operand (stack and local variable slots) in the Java virtual machine state. This is done with a process very similar a Java bytecode verifier. Transforming postfix stack operations to expression nodes involves a compile-time stack of expression nodes. When necessary, we also map stack locals and local varaibles into gcc pseudo-registers.
Generating machine code from the expression nodes uses existing code (instruction generator, optimizer, and assembler).
Linking a set of compiled Java binaries into a library or executable will use the standard linker (GNU ld). However, some enhancements are necessary or at least desirable. The linker must provide a way to build a table mapping class names to Class objects. This can be done using the same mechanism used for running C++ static initialiers. Linker help is also desirable to combine multiple copies of the same literal.
Running a compiled Java program will need a suitable Java run-time environment. This contains support for threads, garbage collection, and all the primitive Java methods. Complete Java support also means being able to dynamiclly load new bytecodes classes. Hence the appropriate Java environment is a basically a Java Virtual Machine. We are using the Kaffe free Java VM (written by Tim Wilkinson), but enhancing and modifying it to be more suitable for pre-compiled code. (For example, we are simplifying the data structures.) Kaffe include a JIT compiler, which solves the problem of calling between pre-compiled and dynamically loaded methods (since both use the same calling convention).
We plan to enhance gdb (the GNU debugger) so it can understand Java-compiled code. This may involve accessing Java meta-data from the Java executable. We may also enhance gdb to understand dynamically-loaded bytecodes, but the need for that is reduced if we instead provide a hook so gdb knows about JIT-compiled code.