There are many Smalltalk application that might benefit from being ported to Java. There are various approaches of interest, including automatic and manual translation from Smalltalk to Java, or a hosting of Smalltalk on top of Java.
The Smalltalk language itself is fairly simple, but Smalltalk includes an extensive set of standard classes. Converting a Smalltalk program includes providing an environment that contains converted versions of the standard Smalltalk classes. Among these classes are a number of "system" classes that interact with the implementation, such as Class, Behavior, CompiledMethod, and so on. So one big question is to what extent so we want these to behave as in a Smalltalk environment, or as in a Java evironment.
A Smalltalk "virtual machine" is typically implemented as a collection of "system" Smalltalk classes, plus various primitive methods implemented in a lower-level language. Smalltalk source for the system classes is freely available from various places, such as GNU Smalltalk. Since we must be able to compile Smalltalk user code, it is reasonable to implement our virtual machine by just compiling an existing set of system classes, and then writing the primitive methods in Java. Such an approach provides very good Smalltalk support, but may provide less integration of Java object with Smalltalk object.
package ST; public class Foo extends ST.Bar { public ST.Object a; public ST.Object b; };Note: I assume it is not necessary to support the
become:
primitive. Doing do would require either an extra level of indirection
(which would make the result less efficient and less Java-like),
or a very Java-implementation-specific native method implemented in C.
Foo
is that the instance methods of Foo
become
regular methods of ST.Foo
, and the class methods of
Foo
becomes the static methods of ST.Foo
.
Unfortunately, this does not work if you want to be able to dynamically change, add, or delete methods belonging to an existing class. In that case you have to separate out the data structures that implement the methods from the data structures that implement objects and classes. The natural way to do that is to have a separate class for each method. This class would have an internal name generated to be unique, and (in the case of a re-definition) would be loaded using a Java ClassLoader.
One scheme would use a standard CompiledMethods class:
class CompiledMethod extends ST.Object { ST.Symbol name; ST.Class clas; abstract public ST.Object doitN (ST.Object receiver, ST.Object[] args); };and each method would be compiled into a sub-class of CompiledMethod.
class Foo_foo extends CompiledMethod { public static Object foo (ST.Foo receiver, ST.Object arg1) { ... compiler code for method foo ... } public ST.Object doitN (ST.Object receiver, ST.Object[] args) { return foo ((ST.Foo) receiver, args[0]); } }
atAll:put:
is associacted with
the interface:
package StSelector; public interface Has_atAll_put { public ST.Object atAll_put (Object, Object); };Then any class that provides a method with the selector
atAll:put:
would implement the interface Has_atAll_put
.
The translation of:
R atAll: C put: Ois then straight-forward:
((StSelector.Has_atAll_put)R).atAll_put (C, O)There are at least four problems with this approach:
doesNotUnderstand
.
If the receiver does not support a method, a casting exception
is thrown. The exception could be caught, and converted to
a send of doesNotUnderstand
, but that would bloat
the generated code substantially.
StSelector
might be a hassle. Unless you have a fixed code-base
(and can pre-generate the complete set of selectors),
you have to generate the interface classes as you need them,
being careful to not generate a duplicate.
foo:
is only understood by classes that are sub-classes of
Foo
, then it can dispense with the interface class,
and do a cast to ST.Foo
instead:
((ST.Foo)R).atAll_put (C, O)
Advantages of this scheme:
package ST; public class Object { ... public abstract ST.Object sendN(ST.Object receiver, String selector, ST.Object[] args); };Then for a class Foo that has methods foo and bar:
package ST; public class Foo extends ST.Object { public ST.Object foo(Object arg) { ... } public ST.Object bar(Object arg) { ... } public ST.Object sendN(ST.Object receiver, String selector, ST.Object[] args) { // We assume all selector Strings have been intern'd. if (selector == "foo") return foo (args[0]); else if (selector == "bar") return bar (args[0]); else return super.sendN (receiver, selector, args); } };This is simple. The problem is that it is difficult to make it efficient. There is no easy way to do method caching, since methods are not objects. Some optimizations are possible - for example we could use separate send methods for sifferent argument lengths: send0 would be used for messages with no arguments; send1 would be used for messages with one argument; send2 would be used for messages with two arguments; and sendN would be used for messages with three or more arguments.
package ST; public class T extends S. { public ST.Object foo(Object arg) { ... } public ST.Object bar(Object arg) { ... } public ST.Object sendN(ST.Object receiver, int selector, ST.Object[] args) { switch (selector) { case 20: return foo (args[0]); case 21: return bar (args[0]); default: // Note that super.sendN could be inlined into this switch. return super.sendN (receiver, selector, args); } };Of course we will still need to map selector Symbols to ints, but we can cache the result.
ST.Object
and just
use java.lang.Object
if we re-map the methods
in Object. For example the Smalltalk X ~~ Y
can be directly compiled (with no methods sends)
to the Java Boolean.convert(!( X == Y))
,
where the static method returns the appropriate Boolean
object given it argument.
Similarly, X = Y
can be translated into
Boolean.convert(X.equals(Y))
.
The tricky part of this is that we also have to translate
definitions of =
to be definitions
of equals
.
Similarly, we could replace Boolean with Java.lang.Boolean,
Symbol with interned java Strings, and so on.
The further go in that direction, the more efficient
and ideomatic the code is likely to be. But there are
dangers. For example, if someone re-defines =
to return a value that is not a Boolean (perhaps they
like three-valued logics), they we will probably break
their code. And if someone depends on Symbol being
a sub-class of String, that will break if we represent
Symbol using java.lang.String (since that only inherits from
Object).