Java/C++ integration

Writing native Java methods in natural C++

Per Bothner
Cygnus Solutions

bothner@cygnus.com
1325 Chesapeake Terrace
Sunnyvale, CA 94089,
USA

November, 1997

Background

Not all the code in a Java application can be written in Java. Some must be written in a lower-level language, either for efficiency reasons, or to access low-level facilties not accessible in Java. For this reason, Java methods may be specified as "native". This means that the method has no method body (implementation) in the Java source code. Instead, it has a special flag which tells the Java virtual machine to look for the method using some unspecified lookup mechanism.

Sun's original Java Development Kit (JDK) version 1.0 defined a programming interface for writing native methods in C. This provided rather direct and efficient access to the underlying VM, but was not officially documented, and was tied to specifics of the VM implementation. (There was little attempt to make it an abstract API that could work with any VM.)

This document is a proposal and a work-in-progress. It is not a specification, and Cygnus makes no commitment to implement any part of the proposal. Note also that I use the word "Java" (a trademark of Sun Microsystems) rather casually. This needs to be cleaned up. (Cygnus has not yet decided what we will call our implementation of the Java language platform.)

Assymmetrix has a Supercede Java environment that boasts "seamless" C++/Java integration. That needs to be investigated.

The Java Native Interface

In JDK 1.1, Sun defined a "Java Native Interface" (JNI) that defines the offical portable programming interface for writing such "native methods" in C or C++. This is a binary interface (ABI), allowing someone to ship a compiled library of JNI-compiled native code, and have it work with any VM implementation (for that platform). The downside is that it is a rather heavy-weight interface, with substantial overheads. For example, for native code to access a field in an object, it needs to make two function calls (though the result of the first can be saved for future accesses). This is cumbersome to write and slow at run-time. Worse, for some applications, is that the field is specified by a run-time string, and found by searching run-time "reflective" data structures. Thus the JNI requires the availability at run-time of complete reflective data (names, types, and positions of all fields, methods, and classes). The reflective data has other uses (there is a standard set of Java classes for accessing the reflective data), but when memory is tight, it is a luxury many applications do not need.

As an example, here is a small Java example of a class intended for timing purposes. (This could be written in portable Java, but let us assume for some reason we don't want to do that.)

package timing ;
class Timer {
  private long last_time;
  private String last_comment;
  /** Return time in milliseconds since last call,
   * and set last_comment. */
  native long sinceLast(String comment);
}
This is how it could be programmed using the JNI:
 
extern "C" /* specify the C calling convention */ 
    jdouble Java_Timer_sinceLast (
         JNIEnv *env,           /* interface pointer */
         jobject obj,           /* "this" pointer */
         jstring comment)   /* argument #1 */
{
  // Note that the results of the first three statements
  // could be saved for future use (though the results
  // have to be made "global" first).
  jclass cls = env->FindClass("timing.Timer");
  jfieldId last_time_id = env->GetFieldID(cls, "last_time", "J");
  jfieldId last_comment_id = env->GetFieldID(cls, "last_comment",
                                             "Ljava_lang_String;");

  jlong old_last_time = env->GetLongField(obj, last_time_id);
  jlong new_last_time = calculate_new_time();
  env->SetLongField(obj, last_time_id, new_last_time);
  env->SetObjectField(obj, last_comment_id, comment);
  return new_last_time - old_last_time;
}
Note the first env parameter, which is a pointer to a thread-specific area, which also includes a pointer to a table of functions. The entire JNI is defined in terms of these functions, which cannot be inlined (since that would make JNI methods no longer binary compatible accross VMs).

The Cygnus Java product will support the JNI, but we will also offer a more efficient, lower-level, and more natural native API. The basic idea is to make GNU Java compatible with GNU C++ (G++), and provide a few hooks in G++ so C++ code can access Java objects as naturally as native C++ objects. The rest of this paper goes into details about this integrated Java/C++ model.

We will go into more detail about this "Kaffe Native Interface" (KNI) in this paper. However, the key is that the calling conventions and data accesses for KNI are the same as for normal nonnative Java methods. Thus there is no extra JNIEnv parameter, and the C++ programmer gets direct access to the VM representation. This does require co-ordination between the C++ and Java implementations.

Here is the earlier example written using KNI:

#include "timing_Timer.h"

timing::Timer::sinceLast(jstring comment)
{
  jlong old_last_time = this->last_time_id;
  jlong new_last_time = calculate_new_time();
  this->last_time_id = new_last_time;
  this->last_comment_id = comment;
  return new_last_time - old_last_time;
}
This uses the following automatically-generated timing_Timer.h:
#include <kni.h> // "Kaffe Native Interface"
class timing {
  class Timer : public java::lang::Object {
    jlong last_time;
    jstring last_comment;
  public:
    jlong virtual sinceLast(jstring comment);
  };
};

Utility macros

Whether or not we are using the JNI, we still need a toolkit of utility functions so C++ code code can request various services of the VM. For operations that have a direct correspondence in C++ (such as accessing an instance field or throwing an exception), we want to use the C++ facility. For other features, such as creating a Java string from a nul-terminated C string, we need utility functions. In such cases we define a set of interfaces that have similar names and functionality as the JNI functions, except that they do not depend on a JNIEnv pointer.

For example, the JNI interface to get a Java string from a C string is the following in C:

jstring str = (*env)->NewStringUTF(env, "Hello");
and the following in C++:
jstring str = env->NewStringUTF("Hello");
(The C++ interface is just a set of inline methods that warp the C interface.)

In the KNI, we do not use a JNIEnv pointer, so the usage is:

jstring str = JvNewStringUTF("Hello");
We use the prefix Jv to indicate the KNI facilities.

It is useful to be able to conditionally compile the same source to use either the fast KNI or the portable JNI. That is possible, with some minor inconvenience, because when USE_JNI is defined, the Jv features are defined as macros that expand to JNI functions:

#if USE_JNI
#define JNIENV() JvEnv /* Must be available in scope. */
#define JvNewStringUTF(BYTES) \
  ((JNIENV())->NewStringUTF(BYTES))
#else /* ! USE_JNI */
extern "C" jstring JvNewStringUTF (const char*);
#endif /* ! USE_JNI */

Field access are more tricky. When using JNI, we have to use a jfieldId, but when using KNI we can access the field directly. We require that the programmer uses a convention where the jfieldId used to access a field named foo is foo_id.

#if USE_JNI
#define JvGetLongField(OBJ, FIELD) \
  (JNIENV()->GetLongField(OBJ, FIELD##_id))
#else
#define JvGetLongField(OBJ, FIELD) ((OBJ)->FIELD)
#endif

Here is how we can write the earlier example to support either interface:

#if USE_JNI
extern "C" jdouble
Java_Timer_sinceLast (JNIEnv *JvEnv, jobject JvThis,
                      jstring comment)
#else
jdouble
timing::Timer::sinceLast(jstring comment)
#endif
{
#if USE_JNI
  jclass cls = env->FindClass("timing.Timer");
  jfieldId last_time_id = env->GetFieldID(cls, "last_time", "J");
  jfieldId last_comment_id = env->GetFieldID(cls, "last_comment",
                                             "Ljava_lang_String;");
#endif
  jlong old_last_time = JvGetLongField(JvThis, last_time);
  jlong new_last_time = calculate_new_time();
  JvSetLongField(JvThis, last_time, new_last_time);
  JvSetObjectField(JvThis, last_comment, comment);
  return new_last_time - old_last_time;
}

Using the C language

Some programmers might prefer to write Java native methods using C. The main advantages of that are that C is more universally available and more portable. However, if portability to multiple Java implementations is important, one should use the JNI. Still, it might be nice to have Jv-style macros that would allow one to select between portable JNI-based C, or Kaffe-optimize KNI. The problem is that an efficient KNI-style interface is much more inconvenient in C than in C++. In C++, we can have the compiler handle inheritance, exception handling, name mangling of methods, and so on. In C the programmmer would have to do much more of this by hand. It should be possible to come up with a set of macros for programmers willing to do that. I am not convinced that this is a high priority, given that most environments that support C and Java will also support C++. The main issue is whether it is OK to require a C++ compiler to build the Kaffe native methods. If using C++ makes it easier to write core Java libraries more efficiently, I think the trade-off is worth it.

Packages

The only global names in Java are class names, and packages. A package can contains zero or more classes, and also zero or more sub-packages. Every class belongs to either an unnamed package or a package that has a hierarchical and globally unique name.

A Java package is mapped to a C++ namespace. The Java class java.lang.String is in the package java.lang, which is a sub-package of java. The C++ equivalent is the class java::lang::String, which is in the namespace java::lang, which is in the namespace java.

The suggested way to do that is:

// Declare the class(es), possibly in a header file:
namespace java {
  namespace lang {
    class Object;
    class String;
  }
}

class java::lang::String : public java::lang::Object
{
  ...
};

Leaving out package names

Having to always type the fully-qualified class name is verbose. It also makes it more difficult to change the package containing a class. The Java package declaration specifies that the following class declarations are in the named package, without having to explicitly name the full package qualifiers. The package declaration can be followed by zero or more import declarations, which allows either a single class or all the classes in a package to be named by a simple identifier. C++ provides something similar with the using declaration and directive.

A Java simple-type-import declaration:

import PackageName.TypeName;
allows using TypeName as a shorthand for PackageName.TypeName. The C++ (more-or-less) equivalent is a using-declaration:
using PackageName::TypeName;

A Java import-on-demand declaration:

import PackageName.*;
allows using TypeName as a shorthand for PackageName.TypeName The C++ (more-or-less) equivalent is a using-directive:
using namespace PackageName;

Nested classes as a substitute for namespaces

G++ does not implement namespaces yet. However, it does implement nested classes, which provide similar (though less convenient) functionality. This style seems to work:

class java {
  class lang {
    class Object { } ;
    class String;
  };
};

class java::lang::String : public java::lang::Object
{ ... }
Note that the generated code (including name mangling) using nested classes is the same as that using namespaces.

Object model

From an implementation point of view we can consider Java to be a subset of C++. Java has a few important extensions, plus a powerful standard class library, but on the whole that does not change the basic similarity. Java is a hybrid object-oriented language, with a few native types, in addition to class types. It is class-based, where a class may have static as well as per-object fields, and static as well as instance methods. Non-static methods may be virtual, and may be overloaded. Overloading in resolved at compile time by matching the actual argument types against the parameter types. Virtual methods are implemented using indirect calls through a dispatch table (virtual function table). Objects are allocated on the heap, and initialized using a constructor method. Classes are organized in a package hierarchy.

All of the listed attributes are also true of C++, though C++ has extra features (for example in C++ objects may also be allocated statically or in a local stack frame in addition to the heap). So the most important task in integrating Java and C++ is to remove gratuitous incompatibilities.

Object references

We implement a Java object reference as a pointer to the start of the referenced object. It maps to a C++ pointer. (We cannot use C++ references for Java references, since once a C++ reference has been initialized, you cannot change it to point to another object.) The null Java reference maps to the NULL C++ pointer.

Note that in JDK an object reference is implemented as a pointed to a two-word "handle". One word of the handle points to the fields of the object, while the other points to a method table. GNU Java does not use this extra indirection.

Primitive types

Java provides 8 "primitives" types: byte, short, int, long, float, double, char, and boolean. These as the same as the following C++ typedefs (which are defined in a standard header file): jbyte, jshort, jint, jlong, jfloat, jdouble, jchar, and jboolean.

Java typeC/C++ typenameDescription
bytejbyte8-bit signed integer
shortjshort16-bit signed integer
intjint32-bit signed integer
longjlong64-bit signed integer
floatjfloat32-bit IEEE floating-point number
doublejdouble64-bit IEEE floating-point number
charjchar16-bit Unicode character
booleanjbooleanlogical (Boolean) values
voidvoidno value

Object fields

Each object contains an object header, followed by the instance fields of the class, in order. The object header consists of a single pointer to a dispatch or virtual function table. (There may be extra fields "in front of" the object, for example for memory management, but this is invisible to the application, and the reference to the object points to the dispatch table pointer.)

The fields are laid out in the same order, alignment, and size as in C++. Specifically, 8-bite and 16-bit native types (byte, short, char, and boolean) are not widened to 32 bits. Note that the Java VM does extend 8-bit and 16-bit types to 32 bits when on the VM stack or temporary registers. The JDK implementation and earlier versions of Kaffe also extends 8-bit and 16-bit object fields to use a full 32 bits. However, GNU Java was recently changed so that 8-bit and 16-bits fields now only take 8 or 16 bits in an object. In general Java field sizes and algnment are now the same as C and C++.

Arrays

While in many ways Java is similar to C and C++, it is quite different in its treatment of arrays. C arrays are based on the idea of pointer arithmetic, which would be incompatible with Java's security requirements. Java arrays are true objects (array types inherit from java.lang.Object). An array-valued variable is one that contains a reference (pointer) to an array object.

Referencing a Java array in C++ code is done using the JArray template, which as defined as follows:

class __JArray : public java::lang::Object
{
public:
  int length;
};

template<class T>
class JArray : public __JArray
{
  T data[0];
public:
  T& operator[](jint i) { return data[i]; }
};
The following convenince typedefs (matching JNI) are provided.
typedef __JArray *jarray;
typedef JArray<jobject> *jobjectArray;
typedef JArray<jboolean> *jbooleanArray;
typedef JArray<jbyte> *jbyteArray;
typedef JArray<jchar> *jcharArray;
typedef JArray<jshort> *jshortArray;
typedef JArray<jint> *jintArray;
typedef JArray<jlong> *jlongArray;
typedef JArray<jfloat> *jfloatArray;
typedef JArray<jdouble> *jdoubleArray;

Overloading

Both Java and C++ provide method overloading, where multiple methods in a class have the same name, and the correct one is chosen (at compile time) depending on the argument types. The rules for choosing the correct method are (as expected) more complicated in C++ than in Java, but the fundamental idea is the same. We do have to make sure that all the typedefs for Java types map to distinct C++ types.

Common assemblers and linkers are not aware of C++ overloading, so the standard implementation strategy is to encode the parameter types of a method into its assembly-level name. This encoding is called mangling, and the encoded name is the mangled name. The same mechanism is used to implement Java overloading. For C++/Java interoperability, it is important to use the same encoding scheme. (This is already implemented in jc1, except for some minor necessary adjustments.)

Virtual method calls

Virtual method dispatch is handled essentially the same in C++ and Java -- i.e. by doing an indirect call through a function pointer stored in a per-class virtual function table. C++ is more complicated because it has to support multiple inheritance. Traditionally, this is implemented by putting an extra delta integer offset in each entry in the virtual function table. This is not needed for Java, which only needs a single function pointer in each entry of the virtual function table. There is a more modern C++ implementation technique, which uses thunks, which does away with the need for the delta fields in the virtual function tables. This is now an option in G++, and will soon be the default on Linux. We need to make sure that Java classes (i.e. those that inherit from java.lang.Object) are implemented as if using thunks. (No actual thunks are needed for Java classes, since Java does not have multiple inheritance.)

The first one or two elements of the virtual function table are used for special purposes in both GNU Java and C++; in Java, it points to the class that owns the virtual function table. G++ needs to know that Java is slightly different.

Allocation

New Java objects are allocated using a class-instance-creation-expression:

new Type ( arguments )
The same syntax is used in C++. The main difference is that C++ objects have to be explicitly deleted, which in Java they are automatically deleted by the garbage collector. For a specic class, we can define in C++ operator new:
class CLASS {
  void* operator new (size_t size) { return soft_new(MAGIC); }
}
However, we don't want a user to have to define this magic operator new for each class. It needs to be done in java.lang.Object. This is not possible without some compiler support (because the MAGIC argument is class-dependent); however, it is straight-forward to implement such support. Allocating an array is a special case, since the space needed depends on the run-time length given.

Object construction

In both C++ and Java newly created objects are allocated by a constructor. In both languages, a constructor is a method that is automatically called. Java has some restrictions on how constructors are called, but basically the calling convention (and overload resolution) are as for standard methods. In G++, methods get passed an extra magic argument, which is not passed for Java constructors. G++ also has the constructors set up the vtable pointers. In Java, the object alloctor sets up the vtable pointer, and the constructor does not change the vtable pointer. Hence, the G++ compiler needs to know about these differences.

Object finalization

A Java methods with the special name finalize serves some of the function as a C++ destructor method. The latter is responsible for freeing up any resources owned by the object before it is destroyed, including deleting any sub-objects it points. In Java, the garbage collector will take care of deleting no-longer-needed sub-objects, so there is much less need for finalization, but it is occasionally needed.

It might make sense to consider the C++ syntax for a finalizer: ~ClassName as being equivalent to the Java finalize method. That would mean that if class that inherits from java.lang.Object defined a C++-style destructor, it would be equivalent to defining a finalize method. However, I see no useful need solved by doing that. Instead: If you want to define or invoke a Java finalizer from C++ code, you will need to define or invoke a method named finalize.

Interfaces

A Java class can implement zero or more interfaces, in addition to inheriting from a single base class. An interface is a collection of constants and method specifications; it is similar to the signatures available as a G++ extension. An interface provides a subset of the functionality of C++ abstract virtual base classes, but are normally implemented differently. Since the mechanism used to implement interfaces in GNU Java will change, and since interfaces are infrequently used by Java native methods, we will not say anything more about them now.

Exceptions

It is a goal of the Gcc exception handling mechanism that it as far as possible be language independent. The existing support is geared towards C++, but should be extended for Java. Essentially, the Java features are a subset of the G++ features, in that C++ allows near-arbitrary values to be thrown, while Java only allows throwing of references to objects that inherit from java.lang.Throwable. So once the Gcc exception handling is more stable, it should be trivial to add Java support. The main change needed for Java is how type-matching is done; fixing that would benefit C++ as well. The main other issue is that we need to make Kaffe's representation of exception ranges be compatible with Gcc's.

The goal is that C++ code that needs to throw a Java exception would just use the C++ throw statement. For example:

throw new java::io::IOException(JvNewStringUTF("I/O Error!"));

There is also no difference between catching a Java exception, and catching a C++ exception. The following Java fragment:

try {
  do_stuff();
} catch (java.IOException ex) {
  System.out.println("caught I/O Error");
} finally {
  cleanup();
}
could be expressed this way in G++:
try {
  try {
    do_stuff();
  } catch (java::io::IOException ex) {
     printf("caught I/O Error\n;");
  }
catch (...) {
  cleanup();
  throw;  // re-throws exception
}
Note that in C++ we need to use two nested try statements.

Synchonization

Each Java object has an implicit monitor. The Java VM uses the instruction monitorenter to acquire and lock a monitor, and monitorexit to release it. The JNI has corresponding methods MonitorEnter and MonitorExit. The corresponding KNI macros are JvMonitorEnter and JvMonitorExit.

The Java source language does not provide direct access to these primitives. Instead, there is a synchonized statement that does an implicit monitorenter before entry to the block, and does a monitorexit on exit from the block. Note that the lock has to be released even the block is abnormally terminated by an exception, which means there is an implicit try-finally.

From C++, it makes sense to use a destructor to release a lock. KNI defines the following utility class.

class JvSynchronize() {
  jobject obj;
  JvSynchronize(jobject o) { obj = o; JvMonitorEnter(o); }
  ~JvSynchronize() { JvMonitorExit(obj); }
};
The equivalent of Java's:
synchronized (OBJ) { CODE; }
can be simply expressed:
{ JvSynchronize dummy(OBJ); CODE; }

Java also has methods with the synchronized attribute. This is equivalent to wrapping the entire method body in a synchronized statement. Alternatively, the synchronization can be done by the caller wrapping the method call in a synchronized. That implementation is not practical for virtual method calls in compiled code, since it would require the caller to check at run-time for the synchronized attribute. Hence our implementation of Java will have the called method do the synchronization inline.

Improved String implementation

The standard Java implementation is a bit inefficient, because every string requires two object: A java.lang.String object, which contains a reference to an internal char array, which contains the actual character data. If we allow the actual java.lang.String object to have a size the varies depending on how many characters it contains (just like array objects vary in size), we can save the overhead of the extra object. This would save space, reduce cache misses, and reduce garbage collection over-head.

class java::lang::String : public java::lang::Object
{
  jint length;  /* In characters. */
  jint offset;  /* In bytes, from start of base. */
  Object *base; /* Either this or another String or a char array. */

private:
  jchar& operator[](jint i) { return ((jchar*)((char*)base+offset))[i]; }

public:
  jchar charAt(jint i)
  {
    if ((unsigned32) i >= length)
      throw new IndexOutOfBoundsException(i);
    return (*this)[i];
  }

  String* substring (jint beginIndex, jint endIndex)
  {
    ...  check for errors ...;
    String *s = new String();
    s.base = base;
    s.length = endIndex - beginIndex;
    s.offset = (char*) &base[beginIndex] - (char*) base;
    return s;
  }
  ...
}

The tricky part about variable-sized objects is that we can no longer cleanly separate object allocation from object construction, since the size of the object to be allocated depends on the arguments given to the constructor. We can deal with this fairly straight-forwardly from C++ or when compiling Java source code. It is more complicated (though quite doable) when compiling from Java byte-code. We don't have to worry about that, since in any case we have to support the less efficient scheme with separate allocation and construction. (This is needed for JNI and reflection compatibility.)

Changes needed to G++

Here is a list of tweaks needed to G++ before it can provide the C++/Java interoperability we have discussed: