<?xml version="1.0" encoding="utf-8"?>
<!--<!DOCTYPE article 
  PUBLIC "-//Norman Walsh//DTD Simplified DocBk XML V3.1.3.6//EN" 
  "/home/bothner/sgml/xdocbook/docbookx.dtd">-->
<article>
<artheader>
<title>Java/C++ integration</title>
<subtitle>Writing native Java methods in natural C++</subtitle>
<authorgroup>
<author>
<firstname>Per</firstname><surname>Bothner</surname>
<affiliation>
<orgname>Brainfood</orgname>
<address>
<email>bothner@gnu.org</email>
<street>??</street>
<city>??</city>, <state>TX</state> <postcode>??</postcode>,
<country>USA</country>
</address>
</affiliation>
</author>
<author>
<firstname>Tom</firstname><surname>Tromey</surname>
<affiliation>
<orgname>Red Hat</orgname>
<address>
<email>tromey@redhat.com</email>
<street>??</street>
<city>??</city>, <state>TX</state> <postcode>??</postcode>,
<country>USA</country>
</address>
</affiliation>
</author>
</authorgroup>
<date>November, 2000</date>
</artheader>

<abstract><para>
<quote>Native methods</quote> in Java are methods written in some other
language, usually C or C++.  Sun included in JDK 1.1 the
<quote>Java Native Interface</quote> (<acronym>JNI</acronym>) for
writing such native methods in a portable way, independent of JVM
implementation.  JNI satisfies that goal, but has two major problems:
JNI code has major inherent inefficiencies, because evrything thas to be
done as calls through a run-time table of function pointers.
Using JNI is also very verbose, tedious and error-prone for the programmer.
</para>
<para>
The Gnu Compiler for the Java platform (<acronym>GCJ</acronym>)
is based on compiling Java to machine code using the
Gnu Compiler Collection (<acronym>Gcc</acronym>) framework.
GCJ offers an in addition to JNI an alternative, the Compiled Native Interface
(CNI).  CNI is based on the idea of
making the C++ and Java data representations and calling conventions
as close as practical, and using a slightly modified Java-aware
C++ compiler to compile native method written in C++ methods.
CNI code is both very efficient and is also very easy and natural to write,
because it uses standard C++ syntax and idioms to work with Java data.
The runtime and class library associated with GCJ is libgcj, which is written
in a mix of Java and C++ code using CNI.</para>
</abstract>

<sect1><title>Background</title>
<para>
Not all the code in a Java application can be written in Java.  Some
must be written in a lower-level language, either for efficiency
reasons, or to access low-level facilties not accessible in Java.
For this reason, Java methods may be specified as <quote>native</quote>.
This means that the method has no method body (implementation)
in the Java source code.  Instead, it has a special flag which
tells the Java virtual machine to look for the method using
some unspecified lookup mechanism.
</para>
<para>

</para>
<!--<para>
Assymmetrix had a Supercede Java environment that boasts
<quote>seamless</quote> C++/Java integration.
That needs to be investigated.</para>-->
<sect2><title>The Java Native Interface</title>
<para>
Sun's original Java Development Kit (JDK) version 1.0 defined a
programming interface for writing native methods in C.
This provided rather direct and efficient access to the underlying
VM, but was not officially documented, and was tied to specifics
of the VM implementation.  There was little attempt to make it an
abstract <acronym>API</acronym> that could work with any VM.</para>
<para>
For JDK 1.1, Sun defined a <quote>Java Native Interface</quote>
(<acronym>JNI</acronym>) that defines the
offical portable programming interface for writing such
<quote>native methods</quote> in C or C++.
This is a binary interface (<acronym>ABI</acronym>), allowing someone
to ship a compiled library of <acronym>JNI</acronym>-compiled native code,
and have it work with any VM implementation(for that platform).</para>
<para>
The problem with JNI that it is a rather heavy-weight
interface, with major run-time overheads.
It is also very tedious to write code using JNI.
For example, for native code to access a field in an object,
it needs to make two function calls
(though the result of the first can be saved for future accesses).
This is cumbersome to write and slow at run-time.</para>
<para>
To specify a field in JNI, you pass its name as a string to a run-time
routine that searches <quote>reflective</quote> data structures.
Thus the JNI requires the availability at run-time of complete reflective
data (names, types, and positions of all fields, methods, and classes).
The reflective data has other uses (there is a standard set of Java
classes for accessing the reflective data), but when memory is tight,
as in an embedded system, it is a luxury many applications do not need.
</para>
<para>
As an example, here is a small Java example of a class
intended for timing purposes.  (This could be written in portable
Java, but let us assume for some reason we don't want to do that.)
<programlisting>
package timing ;
class Timer {
  private long lastTime;
  private String lastNote;

  /** Return time in milliseconds
   * since last call,
   * and set lastNote. */
  native long sinceLast(String note);
}
</programlisting>
Figure 1 shows how it could be programmed using the <acronym>JNI</acronym>:
Note the first <literal>env</literal> parameter, which is a pointer to
a thread-specific area, which also includes a pointer to a table of
functions.  The entire JNI is defined in terms of these functions,
which cannot be inlined (since that would make JNI methods no
longer binary compatible accross VMs).
</para>
<!--<figure><title>Native code for <classname>Timer</classname> using JNI</title>-->
<programlisting> 
#include &lt;jni.h&gt;

jdouble Java_Timer_sinceLast (
    JNIEnv *env, /* interface pointer */
    jobject obj, /* "this" pointer */
    jstring note) /* argument #1 */
{
  // Note that the results of the first
  // three statements could be saved for
  // future use (though the results
  // have to be made "global" first).
  jclass cls =
     env->FindClass("timing.Timer");
  jfieldId lTid =
     env->GetFieldID(cls, "lastTime",
                 "J");
  jfieldId lNid =
     env->GetFieldID(cls, "lastNote",
                 "Ljava/lang/String;");

  jlong oldTime =
    env->GetLongField(obj, lTid);
  jlong newTime =
    calculate_new_time();
  env->SetLongField(obj, lTid, newTime);
  env->SetObjectField(obj, lNid, note);
  return newTime - old_Time;
}
</programlisting>
<!--</figure>-->
<para>
GCJ supports JNI, but it also offers
a more efficient, lower-level, and more natural native API,
which we call CNI, for <quote>Compiled
Native Interface</quote>.  (It can also stand for Cygnus Native Interface,
since CNI was designed at Cygnus Solutions before Cygnus was acquired
by Red Hat.)
The basic idea is to make GNU Java compatible with GNU C++ (G++), and provide
a few hooks in G++ so C++ code can access Java objects as naturally
as native C++ objects.  The rest of this paper goes into details
about this integrated Java/C++ model.  The key idea is that the
calling conventions and data accesses for CNI are the same as for
normal nonnative Java methods.  Thus there is no extra
<classname>JNIEnv</classname> parameter, and the C++ programmer gets
direct access to the Java objects.  This does require co-ordination
between the C++ and Java implementations.
</para>
<para>
Below is the the earlier example written using CNI.</para>
<!--<figure><title>Native code for <classname>Timer</classname> using CNI</title>-->
<programlisting>
#include "timing/Timer.h"

::timing::Timer::sinceLast(jstring note)
{
  jlong oldTime = this->lastTime;
  jlong newTime = calculate_new_time();
  this->lastTime = newTime;
  this->lastNote = note;
  return newTime - oldTime;
}
</programlisting>
<!--</figure>-->
<para>
This uses automatically-generated
<filename>timing/Timer.h</filename></para>
<!--<figure><title>Automatically-generated <filename>timing/Timer.h</filename></title>-->
<programlisting>
#include &lt;cni.h&gt;
class ::timing::Timer
  : public ::java::lang::Object
{
  jlong lastTime;
  jstring lastNote;
public:
  jlong virtual sinceLast(jstring note);
};
</programlisting>
<!--</figure>-->
</sect2>
</sect1>

<sect1><title>API vs ABI</title>
<para>
A fundamental goal of JNI was that it should be independent of the JVM;
it should be possible to implement JNI on any reasonable
JVM implementation.  CNI can also in principle be implemented on
any reasonable Java implementation, by putting sufficient knowledge
in the C++ compiler.  This is possible because the C++ compiler
can distinguish C++ and Java types, and thus use different representations
for C++ and Java objects.  However, CNI works more naturally
the closer the C++ and Java data representations are.
For GCJ our goal was to make the Java ABI (Application Binary Interface)
as close to the C++ ABI as made sense.</para>
<para>
Another goal of JNI was to define a portable ABI, rather than just an API
(Application Programming Interface).
That for any given platform (machine and os), compiled JNI code should
not depend on the JVM implementation.
However, since JNI is defined in terms of C data types and function
calls, it does depend on the C ABI of the given platform.
One might say that JNI was designed for applications delivered in
compiled form, presumably on some small number of platforms.
It seems a questionable tradeoff to accept the overheads and inconvenience
of JNI for this very restricted form of binary portability,
especially to those of us who believe source should be available.</para>
<para>
That is not to say that we think an ABI is not desirable, far from it.
We think that for now an ABI may be premature, and an API such as CNI
may make more sense.  Note though that much of the C++ community is moving
towards a stable defined ABI, and this will become the default for the
forthcoming Gcc 3.0.  At that point, it may make sense to define a Java
ABI partly in terms of a C++ ABI.  CNI may be viewed as a pre-cursor to
that.</para>
<para>
If we view CNI as an ABI for Java, it nails down a number of aspects of
the Java implementation (such as field layout and exception handling).
CNI leaves other parts of the implementation, such as object allocation
and synchronization, unspecified but defines portable hooks.  These hooks
define an ABI as long as the hooks are function calls, but if the hooks
are macros or get inlined to access implementation-specific fields,
then binary compatibility is gone.</para>
<para>
CNI as currently implemented asumes a conservative garbage collector.
For example CNI lets you loop through an array without having to be
aware of garbage collection issues.  While this seems to prohibit a
copying collector, actually it does not.  Rather, it means that
if a copying collector is used, then the C++ compiler has to be aware
the fact, and generate the needed tables so the collector can update
all registers and memory locations that point at a moved object.</para>
<para>
A related more general disadvantage with GCJ concerns
<quote>Binary Compatibility</quote> as discussed in the Java Language
Specification [??].  
GCJ-compiled classes are much more vulnerable to breaking if a class
they depend on is changed and re-compiled than
Java <literal>class</literal> files are.  Adding a private member to a
base class changes the offsets of fields in a class, which means the
generated code is changed.  This is the same as for C++, and is a
cost of GCJ-style compilation you pay in exchange for performance.
There are techniques that some people use to reduce these problems in C++ [??];
similar techniques may be applicable to GCJ and hence CNI.  One possibility
is that the offset of a field in a structure be compiled into a
link-time constant, that would be resolved by the (static or dynamic) linker.
That would reduce binary compatibility problems quite a bit, though
it may produce slightly less optimal code.</para>

</sect1>

<sect1><title>Utility functions and macros</title>
<para>Both JNI and CNI provide toolkits of utility
functions so native code code can request various services of the VM.
CNI uses the C++ syntax for operations that have a direct correspondence
in C++ (such as accessing an instance field or throwing an exception), 
For other features, such as creating a Java string from a nul-terminated
C string, we need utility functions or macros.
Many of these have similar names and functionality as the JNI functions,
except that they do not depend on a <literal>JNIEnv</literal> pointer.
</para>
<para>
For example, the JNI interface to create a Java string from a C string is
the following in C:</para>
<programlisting>
jstring str =
  (*env)->NewStringUTF(env, "Hello");
</programlisting>
<para>
The JNI C++ interface is just a set of inline methods that wrap the C
interface, for example:</para>
<programlisting>
jstring str = env->NewStringUTF("Hello");
</programlisting>
<para>
In the CNI, we do not use a <literal>JNIEnv</literal> pointer, so the
usage is:</para>
<programlisting>
jstring str = JvNewStringUTF("Hello");
</programlisting>
<para>
In general, <acronym>CNI</acronym> functions and macros start with the
`<literal>Jv</literal>' prefix, for example the function
`<literal>JvNewObjectArray</literal>'.  This convention is used to
avoid conflicts with other libraries.
Internal functions in <acronym>CNI</acronym> start with the prefix
`<literal>_Jv_</literal>';  names with this prefix are reserved to the
implementation according the C and C++ standards.</para>

<sect2><title>Strings</title>
<para>
To illustrate the available utility functions,
<acronym>CNI</acronym> provides a number of utility functions for
working with Java <literal>String</literal> objects.
The names and interfaces are analogous to those of <acronym>JNI</acronym>.
</para>

<para>
<funcsynopsis>
  <funcdef>jstring <function>JvNewString</function></funcdef>
  <paramdef>const jchar *<parameter>chars</parameter></paramdef>
  <paramdef>jsize <parameter>len</parameter></paramdef>
  </funcsynopsis>
  Creates a new Java String object, where
  <parameter>chars</parameter> are the contents, and
  <parameter>len</parameter> is the number of characters.
</para>

<para>
<funcsynopsis>
  <funcdef>jstring <function>JvNewStringLatin1</function></funcdef>
  <paramdef>const char *<parameter>bytes</parameter></paramdef>
  <paramdef>jsize <parameter>len</parameter></paramdef>
 </funcsynopsis>
  Creates a new Java String object, where <parameter>bytes</parameter>
  are the Latin-1 encoded
  characters, and <parameter>len</parameter> is the length of
  <parameter>bytes</parameter>, in bytes.
</para>

<para>
<funcsynopsis>
  <funcdef>jstring <function>JvNewStringLatin1</function></funcdef>
  <paramdef>const char *<parameter>bytes</parameter></paramdef>
  </funcsynopsis>
  Like the first <function>JvNewStringLatin1</function>,
  but computes <parameter>len</parameter>
  using <literal>strlen</literal>.
</para>

<para>
<funcsynopsis>
  <funcdef>jstring <function>JvNewStringUTF</function></funcdef>
  <paramdef>const char *<parameter>bytes</parameter></paramdef>
  </funcsynopsis>
   Creates a new Java String object, where <parameter>bytes</parameter> are
   the UTF-8 encoded characters of the string, terminated by a null byte.
</para>

<para>
<funcsynopsis>
   <funcdef>jchar *<function>JvGetStringChars</function></funcdef>
  <paramdef>jstring <parameter>str</parameter></paramdef>
  </funcsynopsis>
   Returns a pointer to the array of characters which make up a string.
</para>

<para>
<funcsynopsis>
   <funcdef> int <function>JvGetStringUTFLength</function></funcdef>
  <paramdef>jstring <parameter>str</parameter></paramdef>
  </funcsynopsis>
   Returns number of bytes required to encode contents
   of <parameter>str</parameter> as UTF-8.
</para>

<para>
<funcsynopsis>
  <funcdef> jsize <function>JvGetStringUTFRegion</function></funcdef>
  <paramdef>jstring <parameter>str</parameter></paramdef>
  <paramdef>jsize <parameter>start</parameter></paramdef>
  <paramdef>jsize <parameter>len</parameter></paramdef>
  <paramdef>char *<parameter>buf</parameter></paramdef>
  </funcsynopsis>
  This puts the UTF-8 encoding of a region of the
  string <parameter>str</parameter> into
  the buffer <parameter>buf</parameter>.
  The region of the string to fetch is specified by
  <parameter>start</parameter> and <parameter>len</parameter>.
   It is assumed that <parameter>buf</parameter> is big enough
   to hold the result.  Note
   that <parameter>buf</parameter> is <emphasis>not</emphasis> nul-terminated.
</para>
</sect2>
</sect1>

<sect1><title>Object model</title>
<para>
In terms of language features, Java is in essence a subset of C++.
Java has a few important extensions, plus a powerful standard
class library, but on the whole that does not change the basic similarity.
Java is a hybrid object-oriented language, with a few native types,
in addition to class types.  It is class-based, where a class may have
static as well as per-object fields, and static as well as instance methods.
Non-static methods may be virtual, and may be overloaded.  Overloading in
resolved at compile time by matching the actual argument types against
the parameter types.  Virtual methods are implemented using indirect calls
through a dispatch table (virtual function table).  Objects are
allocated on the heap, and initialized using a constructor method.
Classes are organized in a package hierarchy.
</para>
<para>
All of the listed attributes are also true of C++, though C++ has
extra features (for example in C++ objects may also be allocated statically
or in a local stack frame in addition to the heap).
Because <acronym>GCJ</acronym> uses the same compiler technology as
<acronym>g++</acronym> (the GNU C++ compiler), it is possible
to make the intersection of the two languages use the same
<acronym>ABI</acronym> (object representation and calling conventions).
The key idea in <acronym>CNI</acronym> is that Java objects are C++ objects,
and all Java classes are C++ classes (but not the other way around).
So the most important task in integrating Java and C++ is to
remove gratuitous incompatibilities.
</para>

<sect2><title>Primitive types</title>
<para>
Java provides 8 <quote>primitive</quote> types:
<literal>byte</literal>, <literal>short</literal>, <literal>int</literal>,
<literal>long</literal>, <literal>float</literal>, <literal>double</literal>,
<literal>char</literal>, and <literal>boolean</literal>.
These as the same as the following C++ <literal>typedef</literal>s
(which are defined in a standard header file):
<literal>jbyte</literal>, <literal>jshort</literal>, <literal>jint</literal>,
<literal>jlong</literal>, <literal>jfloat</literal>,
<literal>jdouble</literal>,
<literal>jchar</literal>, and <literal>jboolean</literal>.
</para>

<informaltable frame="all" colsep="1" rowsep="0">
<tgroup cols="3">
<thead>
<row>
<entry>Java type</entry>
<entry>C++ name</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>byte</entry>
<entry>jbyte</entry>
<entry>8-bit signed integer</entry>
</row>
<row>
<entry>short</entry>
<entry>jshort</entry>
<entry>16-bit signed integer</entry>
</row>
<row>
<entry>int</entry>
<entry>jint</entry>
<entry>32-bit signed integer</entry>
</row>
<row>
<entry>long</entry>
<entry>jlong</entry>
<entry>64-bit signed integer</entry>
</row>
<row>
<entry>float</entry>
<entry>jfloat</entry>
<entry>32-bit IEEE floating-point</entry>
</row>
<row>
<entry>double</entry>
<entry>jdouble</entry>
<entry>64-bit IEEE floating-point</entry>
</row>
<row>
<entry>char</entry>
<entry>jchar</entry>
<entry>16-bit Unicode character</entry>
</row>
<row>
<entry>boolean</entry>
<entry>jboolean</entry>
<entry>logical (Boolean) values</entry>
</row>
<row>
<entry>void</entry>
<entry>void</entry>
<entry>no value</entry>
</row>
</tbody></tgroup>
</informaltable>
<para>
<funcsynopsis>
<funcdef><function>JvPrimClass</function></funcdef>
<paramdef><parameter>primtype</parameter></paramdef>
</funcsynopsis>
This is a macro whose argument should be the name of a primitive
type, <ForeignPhrase><Abbrev>e.g.</Abbrev></ForeignPhrase>
<literal>byte</literal>.
The macro expands to a pointer to the <literal>Class</literal> object
corresponding to the primitive type.
<ForeignPhrase><Abbrev>E.g.</Abbrev></ForeignPhrase>,
<literal>JvPrimClass(void)</literal>
has the same value as the Java expression
<literal>Void.TYPE</literal> (or <literal>void.class</literal>).
</para>
</sect2>

<sect2><title>Classes</title>
<para>
All Java classes are derived from <literal>java.lang.Object</literal>.
C++ does not have a unique <quote>root</quote> class, but we use
a C++ <literal>java::lang::Object</literal> as the C++ version
of the <literal>java.lang.Object</literal> Java class.  All
other Java classes are mapped into corresponding C++ classes
derived from <literal>java::lang::Object</literal>.</para>
<para>
We consider a Java class such as <classname>java.lang.String</classname>
and the corresponding C++ class <classname>java::lang::String</classname>
to be the <emphasis>same</emphasis> class, just using different syntax.</para>
<para>
Interface inheritance (the <quote><literal>implements</literal></quote>
keyword) is currently not reflected in the C++ mapping.</para>
</sect2>

<sect2><title>Object references</title>
<para>
We implement a Java object reference as a pointer to the start
of the referenced object.  It maps to a C++ pointer.
(We cannot use C++ references for Java references, since
once a C++ reference has been initialized, you cannot change it to
point to another object.)
The <literal>null</literal> Java reference maps to the <literal>NULL</literal>
C++ pointer.
</para>
<para>
The original JDK implemented an object reference as
a pointer to a two-word <quote>handle</quote>.  One word of the handle
points to the fields of the object, while the other points
to a method table.  GNU Java, like many newer Java implementations,
does not use this extra indirection.
</para>
</sect2>

<sect2><title>Casts and Runtime Type Safety</title>
<para>
Java casts do runtime type checking when downcasting.  GCJ
automatically inserts calls to runtime functions to perform these
checks as appropriate.  When writing
CNI code, this checking is not done automatically.  C++ code which
must check this can call
<literal>java::lang::Class::isAssignableFrom</literal>.
</para>
</sect2>

<sect2><title>Object fields</title>
<para>
Each object contains an object header, followed by the instance
fields of the class, in order.  The object header consists of
a single pointer to a dispatch or virtual function table.
(There may be extra fields <quote>in front of</quote> the object,
for example for memory management, but this is invisible to the programmer,
and the reference to the object points to the word contining
the dispatch table pointer.)
</para>
<para>
The fields are laid out in the same order, alignment, and size
as in C++.  Specifically, 8-bite and 16-bit native types
(<literal>byte</literal>, <literal>short</literal>, <literal>char</literal>,
and <literal>boolean</literal>) are <emphasis>not</emphasis>
widened to 32 bits, even though the
Java VM does extend 8-bit and 16-bit types to 32 bits
when on the VM stack or temporary registers.
</para>
<para>
If you include the <literal>gcjh</literal>-generated header for a
class, you can access fields of Java classes in the <quote>natural</quote>
way.  Given the following Java class:
<programlisting>
public class Int
{
  public int i;
  public Int (int i) { this.i = i; }
  public static Int zero = new Int(0);
}
</programlisting>
you can write:
<programlisting>
#include &lt;gcj/cni.h&gt;
#include &lt;Int.h&gt;
Int*
mult (Int *p, jint k)
{
  if (k == 0)
    // static member access.
    return Int::zero;
  return new Int(p->i * k);
}
</programlisting>
</para>
<para>
<acronym>CNI</acronym> does not strictly enforce the Java access
specifiers, because Java permissions cannot be directly mapped
into C++ permission.  Private Java fields and methods are mapped
to private C++ fields and methods, but other fields and methods
are mapped to public fields and methods.
</para>
</sect2>

<sect2><title>Arrays</title>
<para>
While in many ways Java is similar to C and C++,
it is quite different in its treatment of arrays.
C arrays are based on the idea of pointer arithmetic,
which would be incompatible with Java's security requirements.
Java arrays are true objects (array types inherit from
<literal>java.lang.Object</literal>).  An array-valued variable
is one that contains a reference (pointer) to an array object.
</para>
<para>
Referencing a Java array in C++ code is done using the
<literal>JArray</literal> template, which is defined as follows:</para>
<programlisting>
class __JArray : public java::lang::Object
{
public:
  int length;
};

template&lt;class T&gt;
class JArray : public __JArray
{
  T data[0];
public:
  T&amp; operator[](jint i) { return data[i]; }
};
</programlisting>
<para>
For example, if you have a value which has the Java type
<classname>java.lang.String[]</classname>, you can store it
a C++ variable of type <literal>JArray&lt;java::lang::String*&gt;*</literal>.</para>
<para>
CNI has some convenience typedefs which correspond to typedefs from JNI.
Each is the type of an array holding objects of the appropriate type:
<programlisting>
typedef __JArray *jarray;
typedef JArray&lt;jobject&gt; *jobjectArray;
typedef JArray&lt;jboolean&gt; *jbooleanArray;
typedef JArray&lt;jbyte&gt; *jbyteArray;
typedef JArray&lt;jchar&gt; *jcharArray;
typedef JArray&lt;jshort&gt; *jshortArray;
typedef JArray&lt;jint&gt; *jintArray;
typedef JArray&lt;jlong&gt; *jlongArray;
typedef JArray&lt;jfloat&gt; *jfloatArray;
typedef JArray&lt;jdouble&gt; *jdoubleArray;
</programlisting>
</para>
<para>
<funcsynopsis> 
   <funcdef>template&lt;class T&gt;  T *<function>elements</function></funcdef>
   <paramdef>JArray&lt;T&gt; &amp;<parameter>array</parameter></paramdef>
</funcsynopsis>
   This template function can be used to get a pointer to the
   elements of the <parameter>array</parameter>.
   For instance, you can fetch a pointer
   to the integers that make up an <literal>int[]</literal> like so:
<programlisting>
extern jintArray foo;
jint *intp = elements (foo);
</programlisting>
The name of this function may change in the future.</para>
<para>
 You can create an array of objects using this function:
<funcsynopsis> 
   <funcdef>jobjectArray <function>JvNewObjectArray</function></funcdef>
   <paramdef>jint <parameter>length</parameter></paramdef>
   <paramdef>jclass <parameter>klass</parameter></paramdef>
   <paramdef>jobject <parameter>init</parameter></paramdef>
   </funcsynopsis>
   Here <parameter>klass</parameter> is the type of elements of the array;
   <parameter>init</parameter> is the initial
   value to be put into every slot in the array.
</para>
<para>
For each primitive type there is a function which can be used
   to create a new array holding that type.  The name of the function
   is of the form
   <literal>JvNew<replaceable>Type</replaceable>Array</literal>,
   where <replaceable>Type</replaceable> is the name of
   the primitive type, with its initial letter in upper-case.  For
   instance, <literal>JvNewBooleanArray</literal> can be used to create
   a new array of booleans.
   Each such function follows this example:
<funcsynopsis>  
   <funcdef>jbooleanArray <function>JvNewBooleanArray</function></funcdef> 
   <paramdef>jint <parameter>length</parameter></paramdef>
</funcsynopsis>
</para>
<para>
<funcsynopsis>
   <funcdef>jsize <function>JvGetArrayLength</function></funcdef>
   <paramdef>jarray <parameter>array</parameter></paramdef> 
   </funcsynopsis>
   Returns the length of <parameter>array</parameter>.</para>
<para>
Unlike Java, array bounds checking for C++ code is not automatic but
instead must be done by hand.
</para>
</sect2>
</sect1>

<sect1><title>Methods</title>

<para>
Java methods are mapped directly into C++ methods.
The header files generated by <literal>gcjh</literal>
include the appropriate method definitions.
Basically, the generated methods have the same names and
<quote>corresponding</quote> types as the Java methods,
and are called in the natural manner.</para>

<sect2><title>Overloading</title>
<para>
Both Java and C++ provide method overloading, where multiple
methods in a class have the same name, and the correct one is chosen
(at compile time) depending on the argument types.
The rules for choosing the correct method are (as expected) more complicated
in C++ than in Java, but the fundamental idea is the same.
Given a set of overloaded methods generated by <literal>gcjh</literal>
the C++ compiler will choose the expected one,
as long as each primitive Java type maps to a distict C++ type.</para>
<para>
Common assemblers and linkers are not aware of C++ overloading,
so the standard implementation strategy is to encode the
parameter types of a method into its assembly-level name.
This encoding is called <firstterm>mangling</firstterm>,
and the encoded name is the <firstterm>mangled name</firstterm>.
The same mechanism is used to implement Java overloading.
The name mangling used by CNI must be the same as that
generated by GCJ.</para>
</sect2>

<sect2><title>Instance methods</title>
<para>
Virtual method dispatch is handled essentially the same
in C++ and Java -- <abbrev>i.e.</abbrev> by doing an
indirect call through a function pointer stored in a per-class virtual
function table.  C++ is more complicated because it has to support
multiple inheritance, but this does not affect Java classes.
G++ historically used a different calling convention
that was not compatible with the one used by <acronym>GCJ</acronym>.
</para>
<para>
The first two elements of the virtual function table
are used for special purposes in both GNU Java and C++;  in Java,
the first points to the class that owns the virtual function
table, and the second is used for an object descriptor that is used by
the GC mark procedure.
</para>
<para>
Calling a Java instance method in <acronym>CNI</acronym> is done
using the standard C++ syntax.  For example:
<programlisting>
  java::lang::Number *x;
  if (x-&gt;doubleValue() &gt; 0.0) ...
</programlisting>
</para>
<para>
Defining a Java native instance method is also done the natural way:
<programlisting>
#include &lt;java/lang/Integer.h&gt;
jdouble
java::lang:Integer::doubleValue()
{
  return (jdouble) value;
}
</programlisting>
</para>
</sect2>

<sect2><title>Interface method calls</title>
<para>
A Java class can <firstterm>implement</firstterm> zero or more
<firstterm>interfaces</firstterm>, in addition to inheriting from
a single base class. 
An interface is a collection of constants and method specifications.
An interface provides a subset of the
functionality of C++ abstract virtual base classes, but they
are currently implemented differently. Since interfaces
are infrequently used by Java native methods, we have not modified G++
to allow for method calls via interface pointers.  In the future we
might add an explicit mechanism to CNI to allow this.</para>
</sect2>

<sect2><title>Static methods</title>
<para>
Static Java methods are invoked in <acronym>CNI</acronym> using the standard
C++ syntax, using the `<literal>::</literal>' operator rather
than the `<literal>.</literal>' operator.  For example:
</para>
<programlisting>
jint i =
  java::lang::Math::round((jfloat) 2.3);
</programlisting>
<para>
<!-- FIXME this next sentence seems ungammatical jsm -->
Defining a static native method uses standard C++ method
definition syntax.  For example:</para>
<programlisting>
#include &lt;java/lang/Integer.h&gt;
java::lang::Integer*
java::lang::Integer::getInteger(jstring s)
{
  ...
}
</programlisting>
</sect2>

<sect2><title>Object allocation</title>
<para>
New Java objects are allocated using a
<firstterm>class-instance-creation-expression</firstterm>:
<programlisting>
new <replaceable>Type</replaceable> ( <replaceable>arguments</replaceable> )
</programlisting>
The same syntax is used in C++.
In both languages, the <literal>new</literal>-expression actually
does two separate operations:  Allocating an instance, and then running
the instance initializer (constructor).</para>
<para>
Using <acronym>CNI</acronym>, you can allocate a new object
using standard C++ syntax.  The C++ compiler is smart enough to
realize the class is a Java class, and in that case generates a call
to a run-time routine that allocates a garbage-collected object.
If you have overloaded constructors, the compiler will choose the correct one
using standard C++ overload resolution rules.  For example:
<programlisting>
java::util::Hashtable *ht
   = new java::util::Hashtable(120);
</programlisting>
</para>
<para>
In G++, methods get passed an
extra magic argument, which is not passed for Java constructors.
G++ also has the constructors set up the vtable pointers.
In Java, the object allocator sets up the vtable pointer,
and the constructor does not change the vtable pointer.
Hence, the G++ compiler needs to know about these differences.
</para>
<para>
Allocating an array is a special case,
since the space needed depends on the run-time length given.</para>
</sect2>

<!--
<sect2><title>Object finalization</title>
<para>
A Java methods with the special name <function>finalize</function>
serves some of the function as a C++ destructor method.
The latter is responsible for freeing up any resources owned
by the object before it is destroyed, including deleting
any sub-objects it points.  In Java, the garbage collector will
take care of deleting no-longer-needed sub-objects, so there
is much less need for finalization, but it is occasionally needed.
</para>
<para>
It might make sense to consider the C++ syntax for a finalizer:
<literal>~<replaceable>ClassName</replaceable></literal>
as being equivalent to the Java <function>finalize</function> method.
That would mean that if class that inherits from
<literal>java.lang.Object</literal> defined a C++-style destructor,
it would be equivalent to defining a <function>finalize</function> method.
However, I see no useful need solved by doing that.
Instead:  If you want to define or invoke a Java finalizer from C++ code,
you will need to define or invoke a method named <function>finalize</function>.
</para>
<para>
In this proposed hybrid C++/Java environment, there is no clear
distinction between C++ and Java objects.  Java objects inherit
from <literal>java.lang.Object</literal>, and are garbage collected.
On the other hand, regular C++ objects are not garbage collected,
but must be explicitly deleted.
It may be useful to support C++ objects (that do <emphasis>not</emphasis>
inherit from <literal>java.lang.Object</literal>) that would want to be
garbage collected.  CNI will probably provide a way to do that,
by overloading <literal>operator new</literal>.
</para>
<para>
What happens if you explicitly <literal>delete</literal> an object
(Java or C++) that is garbage collected?  The Ellis/Detlefs garbage
collection proposal for C++ says that should cause the finalizer
to be run, but otherwise whether the object memory is freed
is unpredictable;  that seems reasonable to me.
</para>
</sect2>
-->
</sect1>

<sect1><title>Sharing code for JNI and CNI</title>
<para>
It would be nice to combne the advantages of CNI with the portability of JNI.
That is people should be able to write native code that can be compiled for
either CNI or JNI.  This can be done using conditional compilation,
with separate code for JNI and CNI, but of course that makes writing native
methods even worse than plain JNI.  It should be possible to use some
pre-processing tricks to reduce the duplication, though.</para>
<para>
It would be appealing to be able to write CNI code and automatically
translate it to JNI code.  However, recognizing where JNI calls are
needed would require something with the sophistication of a C++ compiler.
The logical thing would be to use a C++ compiler with a suitable option,
say <literal>--emit-jni</literal>.  Then instead of actually
generating JNI C source, this compiler
would generate machine code that calls the appropriate JNI routines.
I.e. rather than actually getting JNI C code, you would get machine code
equivalent to that generated from JNI C code.  This is not quite as
portable as pure JNI source, but it would be portable to all JVMs that
run on platforms to which this compiler has been ported.  For G++,
that is almost all platforms in use.</para>
<para>
Compiling CNI to JNI-using binaries might involve some
combination of G++ changes, an extension to gcjh, and run-time code.
Some ideas have been suggested, but there are no actual plans for such
a project.  Note that 100% automatic
translation might be difficult, so you might have to put in
<literal>#ifdef JNI</literal> conditionals occasionally, but the goal
would be to minimize JNI-specific code.
</para>
</sect1>

<!--
<sect1><title>Using the C language</title>
<para>
Some programmers might prefer to write Java native methods using C.
The main advantages of that are that C is more universally available
and more portable.  However, if portability to multiple Java implementations
is important, one should use the JNI.  Still, it might be nice to have
<literal>Jv</literal>-style macros that would allow one to select between
portable JNI-based C, or Kaffe-optimize CNI.  The problem is that an
efficient CNI-style interface is much more inconvenient in C than in C++.
In C++, we can have the compiler handle inheritance, exception handling,
name mangling of methods, and so on.  In C the programmmer would have to
do much more of this by hand.  It should be possible to come up with a
set of macros for programmers willing to do that.  I am not convinced
that this is a high priority, given that most environments that support
C and Java will also support C++.  The main issue is whether it is OK
to require a C++ compiler to build the Kaffe native methods.
If using C++ makes it easier to write core Java libraries more efficiently,
I think the trade-off is worth it.
</para>
</sect1>
-->

<sect1><title>Packages</title>
<para>
The only global names in Java are class names, and packages.
A <firstterm>package</firstterm> can contains zero or more classes, and
also zero or more sub-packages.
Every class belongs to either an unnamed package or a package that
has a hierarchical and globally unique name.
</para>
<para>
A Java package is mapped to a C++ <firstterm>namespace</firstterm>.
The Java class <literal>java.lang.String</literal>
is in the package <literal>java.lang</literal>, which is a sub-package
of <literal>java</literal>.  The C++ equivalent is the
class <literal>java::lang::String</literal>,
which is in the namespace <literal>java::lang</literal>,
which is in the namespace <literal>java</literal>.
</para>
<para>
The suggested way to do that is:
<programlisting>
// Declare the class(es).
// (Possibly in a header file.)
namespace java {
  namespace lang {
    class Object;
    class String;
  }
}

class java::lang::String
  : public java::lang::Object
{
  ...
};
</programlisting>
</para>

<sect2><title>Leaving out package names</title>
<para>
Having to always type the fully-qualified class name is verbose.
It also makes it more difficult to change the package containing a class.
The Java <literal>package</literal> declaration specifies that the
following class declarations are in the named package, without having
to explicitly name the full package qualifiers.
The <literal>package</literal> declaration can be followed by zero or
more <literal>import</literal> declarations, which allows either
a single class or all the classes in a package to be named by a simple
identifier.  C++ provides something similar
with the <literal>using</literal> declaration and directive.
</para>
<para>
A Java simple-type-import declaration:
<programlisting>
import <replaceable>PackageName</replaceable>.<replaceable>TypeName</replaceable>;
</programlisting>
allows using <replaceable>TypeName</replaceable> as a shorthand for
<literal><replaceable>PackageName</replaceable>.<replaceable>TypeName</replaceable></literal>.
The C++ (more-or-less) equivalent is a <literal>using</literal>-declaration:
<programlisting>
using <replaceable>PackageName</replaceable>::<replaceable>TypeName</replaceable>;
</programlisting>
</para>
<para>
A Java import-on-demand declaration:
<programlisting>
import <replaceable>PackageName</replaceable>.*;
</programlisting>
allows using <replaceable>TypeName</replaceable> as a shorthand for
<literal><replaceable>PackageName</replaceable>.<replaceable>TypeName</replaceable></literal>
The C++ (more-or-less) equivalent is a <literal>using</literal>-directive:
<programlisting>
using namespace <replaceable>PackageName</replaceable>;
</programlisting>
</para>
</sect2>
</sect1>

<sect1><title>Exception Handling</title>
<para>
It is a goal of the Gcc exception handling mechanism that it as far as possible
be language independent.  The Java features are again
a subset of the G++ features, in that C++ allows near-arbitrary values
to be thrown, while Java only allows throwing of references to
objects that inherit from <literal>java.lang.Throwable</literal>.
While G++ and GCJ share a common exception handling framework,
things are not yet perfectly integrated.  The main issue is that the
<quote>run-time type information</quote> facilities of the two
languages are not integrated.</para>
<para>
Still, things work fairly well.  You can throw a Java exception from
C++ using the ordinary <literal>throw</literal> construct, and this
exception can be caught by Java code.  Similarly, you can catch an
exception thrown from Java using the C++ <literal>catch</literal>
construct.</para>
<para>
Note that currently you cannot mix C++ catches and Java catches in a
single C++ translation unit.  This is caused by a limitation in GCC's
internal processing of exceptions, and we do intend to fix this eventually.
</para>
<para>
C++ code that needs to throw a Java exception would
just use the C++ <command>throw</command> statement.  For example:
<programlisting>
if (i >= count) {
  jstring msg =
    JvNewStringUTF("I/O Error!");
  throw new java::io::IOException(msg);
}
</programlisting>
</para>
<para>
There is also no difference between catching a Java exception,
and catching a C++ exception.
The following Java fragment:
<programlisting>
try {
  do_stuff();
} catch (java.IOException  ex) {
  System.out.println("caught I/O Error");
} finally {
  cleanup();
}
</programlisting>
could be expressed this way in G++:
<programlisting>
try {
  try {
    do_stuff();
  } catch (java::io::IOException* ex) {
     printf("caught I/O Error\n;");
  }
catch (...) {
  cleanup();
  throw;  // re-throws exception
}
</programlisting>
Note that in C++ we need to use two nested <literal>try</literal> statements.
</para>
</sect1>

<sect1><title>Exception Generation</title>
<para>
Java code is extensively checked at runtime.  For instance, if a Java
program recurses too deeply, a
<literal>StackOverFlowException</literal> is generated.  Likewise, if
a null pointer is dereferenced, a
<literal>NullPointerException</literal> is generated.  This is
typically done by the Java runtime.</para>
<para>
GCJ is currently weak on these checks.  Explicit null pointer
checks are generated in the specific case of calling a
<literal>final</literal> function.  In other cases we rely on the
runtime to trap segmentation violations and turn them into
<literal>NullPointerException</literal>.  However, this approach only
works on platforms with MMU support.  In the future we plan to give
gcj the ability to automatically generate explicit checks for null
pointers and then generate the appropriate exception.  When this
happens we will most likely not modify the C++ compiler to do this,
but will instead rely on the CNI programmer to add explicit checks by
hand.
</para>
<para>The same considerations apply to stack overflows.</para>
</sect1>

<sect1><title>Synchronization</title>
<para>
Each Java object has an implicit monitor.
The Java VM uses the instruction <literal>monitorenter</literal> to acquire
and lock a monitor, and <literal>monitorexit</literal> to release it.
The JNI has corresponding methods <literal>MonitorEnter</literal>
and <literal>MonitorExit</literal>.  The corresponding CNI macros
are <literal>JvMonitorEnter</literal> and <literal>JvMonitorExit</literal>.
</para>
<para>
The Java source language does not provide direct access to these primitives.
Instead, there is a <literal>synchronized</literal> statement that does an
implicit <literal>monitorenter</literal> before entry to the block,
and does a <literal>monitorexit</literal> on exit from the block.
Note that the lock has to be released even the block is abnormally
terminated by an exception, which means there is an implicit
<literal>try</literal>-<literal>finally</literal>.
</para>
<para>
From C++, it makes sense to use a destructor to release a lock.
CNI defines the following utility class.
<programlisting>
class JvSynchronize() {
  jobject obj;

  JvSynchronize(jobject o)
  { obj = o; JvMonitorEnter(o); }

  ~JvSynchronize()
  { JvMonitorExit(obj); }
};
</programlisting>
The equivalent of Java's:
<programlisting>
synchronized (OBJ) { CODE; }
</programlisting>
can be simply expressed:
<programlisting>
{ JvSynchronize dummy(OBJ); CODE; }
</programlisting>
</para>
<para>
Java also has methods with the <literal>synchronized</literal> attribute.
This is equivalent to wrapping the entire method body in a
<literal>synchronized</literal> statement.
(Alternatively, an implementation could require the caller to do
the synchronization.  This is not practical for a compiler, because
each virtual method call would have to test at run-time if
synchronization is needed.)  Since in <literal>GCJ</literal>
the <literal>synchronized</literal> attribute is handled by the
method implementation, it is up to the programmer
of a synchronized native method to handle the synchronization
(in the C++ implementation of the method).
In otherwords, you need to manually add <literal>JvSynchronize</literal>
in a <literal>native synchronized</literal> method.</para>
</sect1>

<!--
<sect1><title>Improved String implementation</title>
<para>
The standard Java implementation of <classname>String</classname>
is inefficient in that
every string requires <emphasis>two</emphasis> objects:
A <literal>java.lang.String</literal> object, which contains a
reference to an internal <literal>char</literal> array, which
contains the actual character data.
If we allow the actual <literal>java.lang.String</literal> object
to have a size that varies depending on how many characters it contains
(just like array objects vary in size), we can save the overhead of
the extra object.  This saves space, reduces cache misses,
and reduces garbage collection over-head.
</para>
<programlisting>
class java::lang::String
  : public java::lang::Object
{
  jint length;  /* In characters. */
  jint offset;  /* In bytes, from start of base. */
  Object *base; /* Either this or another String or a char array. */

private:
  jchar&amp; operator[](jint i) { return ((jchar*)((char*)base+offset))[i]; }

public:
  jchar charAt(jint i)
  {
    if ((unsigned32) i >= length)
      throw new IndexOutOfBoundsException(i);
    return (*this)[i];
  }

  String* substring (jint beginIndex,
                     jint endIndex)
  {
    ...  check for errors ...;
    String *s = new String();
    s.base = base;
    s.length = endIndex - beginIndex;
    s.offset = (char*) &amp;base[beginIndex]
      - (char*) base;
    return s;
  }
  ...
}
</programlisting>
<para>
The tricky part about variable-sized objects is that we can no longer
cleanly separate object allocation from object construction,
since the size of the object to be allocated depends on the arguments
given to the constructor.  We can deal with this fairly straight-forwardly
from C++ or when compiling Java source code.  It is more complicated
(though quite doable) when compiling from Java byte-code.  We don't
have to worry about that, since in any case we have to support
the less efficient scheme with separate allocation and construction.
(This is needed for JNI and reflection compatibility.)</para>
</sect1>
-->

<sect1><title>Class Initialization</title>
<para>
Java requires that each class be automatically initialized at the time 
of the first active use.  Initializing a class involves 
initializing the static fields, running code in class initializer 
methods, and initializing base classes.  There may also be 
some implementation specific actions, such as allocating 
<classname>String</classname> objects corresponding to string literals in
the code.</para>
<para>
The GCJ compiler inserts calls to <literal>JvInitClass</literal> (actually
<literal>_Jv_InitClass</literal>) at appropriate places to ensure that a
class is initialized when required.  The C++ compiler does not
insert these calls automatically - it is the programmer's
responsibility to make sure classes are initialized.  However,
this is fairly painless because of the conventions assumed by the GCJ
system.</para>
<para>
First, <literal>libgcj</literal> will make sure a class is initialized
before an instance of that object is created.  This is one
of the responsibilities of the <literal>new</literal> operation.  This is
taken care of both in Java code, and in C++ code.  (When the G++
compiler sees a <literal>new</literal> of a Java class, it will call
a routine in <literal>libgcj</literal> to allocate the object, and that
routine will take care of initializing the class.)  It follows that you can
access an instance field, or call an instance (non-static)
method and be safe in the knowledge that the class and all
of its base classes have been initialized.</para>
<para>
Invoking a static method is also safe.  This is because the
Java compiler adds code to the start of a static method to make sure
the class is initialized.  However, the C++ compiler does not
add this extra code.  Hence, if you write a native static method
using CNI, you are responsible for calling <literal>JvInitClass</literal>
before doing anything else in the method (unless you are sure
it is safe to leave it out).</para>
<para>
Accessing a static field also requires the class of the
field to be initialized.  The Java compiler will generate code
to call <literal>_Jv_InitClass</literal> before getting or setting the field.
However, the C++ compiler will not generate this extra code,
so it is your responsibility to make sure the class is
initialized before you access a static field.</para>
</sect1>

<sect1><title>Changes to G++</title>
<para>
Here is a summary of changes made to G++, the GNU C++ compiler,
to make it aware of Java types, and thus provide
the C++/Java interoperability we have discussed:</para>
<itemizedlist>
<listitem><para>
For each Java primitive type (such as <classname>long</classname>),
G++ defines a C++ primitive type, with a name
like <classname>__java__long</classname>.  (A name with two initial
underscore is reserved to the implementation, so it cannot clash with
a valid user identifier.)  These types are distinct from
all other types.  The CNI header files will then do:
<programlisting>
typedef __java_long jlong;
</programlisting>
This mechanism makes it easy for the compiler to distinguish Java
types from standard C++ types, giving them the correct size in bits.
Each of these types has a special <quote>Java type</quote> flag bit set.
When mangling the name of a method into an assembler label, Java types
are recognized.  For example <classname>__java_long</classname> is mangled
the same as the C++ 64-bit integer type <classname>long long</classname>.
</para></listitem>
<listitem><para>
The compiler recognizes <literal>extern "Java"</literal> in addition to
the standard <literal>extern "C"</literal> and <literal>extern "C++"</literal>.
If a class is defined inside the scope of <literal>extern "Java"</literal>,
then the compiler set the <quote>Java type</quote> bit on the class and
the corresponding pointer type.
The <quote>Java type</quote> bit is also set for a class if its
base class has the <quote>Java type</quote> bit set.
The standard libgcj/CNI header files
define <classname>java::lang::Object</classname>
inside <literal>extern "Java"</literal>; thus all classes that inherit
from <classname>java::lang::Object</classname> have
the <quote>Java type</quote> bit set.
</para></listitem>
<listitem><para>
There is a compiler internal function that
checks if a type is a valid Java type.  This is used to catch errors such as
when a programmer accidentally types <classname>long</classname> instead of
<classname>jlong</classname> in the function header of a Java class.
</para></listitem>
<listitem><para>
The compiler has an internal function that
takes a Java type, and generates a declartion that refers to the
corresponding (run-time) <classname>java.lang.Class</classname> instance.
This is used for exception handling and object allocation.
</para></listitem>
<listitem><para>
If when compiling a <classname>new</classname>-expression the allocated
type is a Java type, then the compiler generates a call to the run-time
routine <function>_Jv_AllocObject</function>.  The compiler also
suppresses generating code to cause the object to be de-allocated if the
constructor throws an exception.  (Such de-allocation is mandated by the
C++ standard, but is not correct for Java, which assumes a garbage collector.)
</para></listitem>
<listitem><para>
The interface to constructors needs to be changed so magic
vtable pointer initialization and the extra constructor argument
do not happen when constructing a Java object.
</para></listitem>
<listitem><para>
C++ has the problem that the compiler cannot tell which compilation
unit needs to emit a class's virtual function table.  Various rules and
heuristics are used, but sometimes the same vtable has to be emitted by
more than one compilation unit.  This is not an issue for Java types:
G++ never emits the vtable, since that is done
when GCJ compiles the Java class.
</para></listitem>
<listitem><para>
G++ handles <literal>catch</literal> and <literal>throw</literal>
by generating appropriate libgcj calls.
</para></listitem>
</itemizedlist>
</sect1>

<!--
<sect1><title>Reflection</title>
<para>The types <literal>jfieldID</literal> and <literal>jmethodID</literal>
are as in JNI.</para>
<para>
The function <literal>JvFromReflectedField</literal>,
<literal>JvFromReflectedMethod</literal>,
<literal>JvToReflectedField</literal>, and
<literal>JvToFromReflectedMethod</literal> (as in Java 2 JNI)
will be added shortly, as will other functions corresponding to JNI.</para>
</sect1>
-->

<sect1><title>Performance numbers</title>
<para>
Figure 1 shows some benchmark numbers comparing JNI and CNI.
(The benchmarks are based on code by Matt Welsh.)
(ADD MORE FOR FINAL PAPER.)
They are a number of micro-benchmarks, all run on a 600 MHz Athlon
running RedHat Linux 7 with a pre-2.4 kernel.  Each column measures:
</para>

<figure><title>Benchmark measurements</title>
<tgroup cols="5">
<thead>
<row>
<entry>Test</entry>
<entry>JDK pure Java</entry>
<entry>JDK with JNI</entry>
<entry>GCJ pure Java</entry>
<entry>cj with CNI</entry>
</row>
</thead>
<tbody>
<row>
<entry>Void no-op method</entry>
<entry>.027&#xb5;s</entry>
<entry>.077&#xb5;s</entry>
<entry>.026&#xb5;s</entry>
<entry>.028&#xb5;s</entry>
</row>
<row>
<entry>Instance field increment</entry>
<entry>.06&#xb5;s</entry>
<entry>2.88&#xb5;s / .35&#xb5;s</entry>
<entry>.02&#xb5;s</entry>
<entry>.02&#xb5;s</entry>
</row>
<row>
<entry>Static field increment</entry>
<entry>.033&#xb5;s</entry>
<entry>4.39&#xb5;s / .395&#xb5;s</entry>
<entry>.027&#xb5;s</entry>
<entry>.029&#xb5;s</entry>
</row>
</tbody>
</tgroup>
</figure>

<orderedlist>
<listitem><para>Running JDK 1.3 from Sun, calling a test method written
in pure Java.</para></listitem>
<listitem><para>Again running JDK 1.3 from Sun, with the test method written
using in JNI.
For those rows that have two numbers, the second number is when
using cashing of <classname>jfieldID</classname>s.</para></listitem>
<listitem><para>Using GCJ (version shipped with RedHat 7), calling a
test method written in pure Java.</para></listitem>
<listitem><para>Again using GCJ (version shipped with RedHat 7), calling a
test method written in C++ using CNI.</para></listitem>
</orderedlist>
</sect1>

<sect1><title>Usage and features of gcjh</title>
<figure><title><filename>Timer.h</filename> as gnerated by gjch</title>
<programlisting>
// DO NOT EDIT THIS FILE - it is machine generated -*- c++ -*-

#ifndef __timing_Timer__
#define __timing_Timer__

#pragma interface

#include &lt;java/lang/Object.h&gt;

extern "Java"
{
  namespace timing
  {
    class Timer;
  }
};

class ::timing::Timer : public ::java::lang::Object
{
public: // actually package-private
  virtual jlong sinceLast (::java::lang::String *);
  Timer ();
private:
  jlong lastTime;
  ::java::lang::String *lastNote;
public:

  static ::java::lang::Class class$;
};

#endif /* __timing_Timer__ */
</programlisting>
</figure>
<para>
      The <command>gcjh</command> is used to generate C++ header files from
      Java class files.  By default, <command>gcjh</command> generates
      a relatively straightforward C++ header file.  However, there
      are a few caveats to its use, and a few options which can be
      used to change how it operates.  We don't list the options here,
      as they aren't too relevant to this discussion.
</para>
<para>
gcjh will generate all the required namespace declarations and
<literal>#include</literal>'s for the header file.</para>
<para>
Gcjh also has the ability to decompile simple Java methods.
Currently it does this via ad hoc pattern matching -- it recognizes
empty methods, accessor methods, and methods which return
<literal>this</literal> or <literal>null</literal>.  These choices
were made primarily because they were simple to  implement, but also
because they seemed likely to provide good inlining opportunities.
This feature in effect implements cross-language inlining, and we
anticipate more work here in the future.</para>
<!--
<para>
Note that, while gcjh puts <literal>#pragma
interface</literal> in the generated header file, you should
<emphasis>not</emphasis> put <literal>#pragma implementation</literal>
into your C++ source file.  If you do, duplicate definitions of inline
functions will sometimes be created, leading to link-time errors.
For instance, gcj expects that it will always generate the vtable for
a class.  It needs to do this so that the vtable can directly
reference the class' Class object.  For this reason it is not possible
to derive a purely C++ class from a Java class - some Java code must
always be involved.</para>
-->
<para>
gcjh has to handle mismatches between the C++ and Java programming
languages.  There are a few such problems:
</para>
<itemizedlist>
<listitem><para>
gcjh must be careful to generate fields and methods in the same order
as the compiler itself so that the G++ view of object layout is
compatible with the GCJ view.  This means that gcjh has to be updated
whenever either compiler changes its object layout.
</para></listitem>
<listitem><para>
Some valid Java identifiers, like <literal>register</literal>, are
keywords in C++.  We handle this case in gcjh with some collusion by
gcj.  We add a <literal>$</literal> to the end of any name that
conflicts with a C++ keyword.  For instance, a method
<literal>typename</literal> appears in the header, and in the
generated object file, as <literal>register$</literal>.  This encoding
is robust -- if the Java code has a method named
<literal>struct$</literal>, it is renamed to
<literal>struct$$</literal>.  Thus, conflicts are never possible.
</para></listitem>
<listitem><para>
In C++ it isn't possible to have a field and a method with the same
name, while in Java this is possible.  If the conflicting field is
static, gcjh simply issues an error.  Otherwise, the field will be
renamed by appending `__' in the generated header; this is safe
because object code that refers to instance fields will not use the
field's name.  In the future we plan to eliminate this problem.  One
approach would be to modify gcj and gcjh to mangle conflicting field
names as is done for identifiers matching C++ keywords.
</para></listitem>
<listitem><para>
In Java it is often convenient to refer to a Class object, for
instance via the <literal>Foo.class</literal> syntax.  In C++ there is
no way to do this, so gcjh makes the class object appear to be a
static field of the class itself.  This field is named
<literal>class$</literal>.  Typical code uses a pointer to the Class
object, e.g. <literal>&amp;java::lang::Class::class$</literal>.
</para></listitem>
<listitem><para>
gcjh assumes that all the methods and fields of a class have ASCII
names.  The C++ compiler cannot correctly handle non-ASCII
identifiers.  gcjh does not currently diagnose this problem.  In the
future we hope to generate headers which use C++ UCNs (the C++
equivalent of Java <literal>\u</literal> escapes) to avoid this problem.
</para></listitem>
</itemizedlist>
<para>
gcjh can automatically generate C++ stubs for a given class.  For each
native method in the class it will
generate a C++ stub whose default implementation throws an exception.
This feature is convenient when writing a class for the first time.</para>
<para>
An alternative to a separte gcjh program would be modifying G++ so it
could read <filename>.class</filename> files directly.  However,
using gcjh is almost as convenient, and G++ is
already very complex, so adding significant Java-specific changes
seems ill-advised from the perspective of long-term compiler maintenance..
</para>
<para>
Figure 2 is the output of gcjh on the Timer class from the first section.
</para>
</sect1>

<bibliography>
<title>Bibliography</title>

<biblioentry>
<abbrev>Bothner97</abbrev>
<authorgroup>
<author><firstname>Per</firstname> <surname>Bothner</surname></author>
</authorgroup>
<title>A Gcc-based Java Implementation</title>
<bibliomisc>IEEE Compcon '97</bibliomisc>
<pubdate>1997</pubdate>
</biblioentry>

<biblioentry>
<abbrev>JavaSpec</abbrev>
<!--<bookbiblio>-->
<authorgroup>
<author><firstname>James</firstname> <surname>Gosling</surname></author>
<author><firstname>Bill</firstname> <surname>Joy</surname></author>
<author><firstname>Guy</firstname> <surname>Steele</surname></author>
</authorgroup>
<title>The Java Language Specification</title>
<publisher><publishername>Addison-Wesley</publishername></publisher>
<copyright><year>1996</year></copyright>
<!--</bookbiblio>-->
</biblioentry>

</bibliography>
</article>
