<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<?xml-stylesheet href="../docbook-css/driver.css" type="text/css"?>
<article id="root">
<articleinfo>
<title>Mixing Lisps in Kawa</title>
<authorgroup>
<author>
<firstname>Per </firstname><surname>Bothner</surname>
</author>
</authorgroup>
</articleinfo>

<!-- biography:
Per Bothner received degrees from Univerity of Oslo and Stanford (Ph.D, 1988).
He was an early employee at Cygnus Support, the pioneering
company based on Free Software, where he worked on a number of projects.
He was the designer and technical lead of Gcj, a Java ahead-of-time
compiler based on GCC.
Per also developed Kawa, which compiles Scheme functions on-the-fly
into Java classes.  The Kawa framework is also being used to implement
other languages, including Emacs Lisp (Jemacs), Common Lisp, and XQuery.
Per now works in the San José area as a consultant supporting Kawa.
-->

<abstract>
<para>
Kawa started as a Scheme implementation written in Java,
based on compiling Scheme forms to Java byte-codes.
It has developed into a powerful Scheme dialect
whose strengths include speed and easy access to Java classes.
It is Free Software that some companies depend on.</para>
<para>
The Kawa compiler and run-time environment have been
generalized to implement other languages besides Scheme,
both in the Lisp family (Emacs Lisp,
Common Lisp, and BRL), and outside it (XQuery, Nice).
This paper focus on the differences and challenges of implementing
Common Lisp (not usable yet) and
Emacs Lisp, which supports the JEmacs editor.</para>
</abstract>

<sect1 id="introduction"><title>Introduction</title>
<para>
Kawa <xref linkend="Kawa"/> is best known as an implementation of Scheme that
compiles to Java byte-codes.
However, it has evolved to a framework supporting multiple
languages, also including XQuery <xref linkend="Qexo"/>,
Emacs Lisp <xref linkend="JEmacs"/>,
and a start at Common Lisp.
(XQuery <xref linkend="XQuery"/> is an interesting language whose focus is on
selecting and generating XML-like tree structures.
It is in the process of being standardized by the World Wide Web Foundation.)
In this article we will focus on the Lisp family of languages.</para>

<para>
Of the Kawa languages, Scheme is the most mature and feature-full.
It is being used by various projects and companies.
Kawa's core supports a lot Common Lisp features, but very little of
Common Lisp's syntax has been implemented.
The Emacs Lisp support is intermediate, comprising enough to get some
of the basic Emacs functionality working.
</para>

<para>
Data representation and calling conventions are largely the same for Scheme,
Common Lisp, and Emacs Lisp.
That is why some of the primitive Emacs Lisp and Common Lisp
functions and syntax are currently written in Scheme, just because
Scheme is more complete. Adding better support for type declarations
and access to Java methods would make it easier to write low-level
code in Emacs Lisp and Common Lisp; doing so would not be difficult.</para>
<para>
In the following I will touch on some of the interesting challenges
of a multi-language Java-based environment, focusing on Emacs Lisp
and Common Lisp challenges and their differences from Scheme.
In most ways we can view Emacs Lisp is a subset of Common Lisp
with a few quirks, plus dynamic (fluid) binding in place of
Common Lisp's default lexical binding.</para>
</sect1>

<sect1 id="languages"><title>Multiple Languages</title>
<para>
Traditional <quote>static</quote> compilers that generate
machine code often support <quote>front-ends</quote> for more than one language.
This is rarer for functional or dynamic languages (ones that
support <literal>eval</literal>).  One reason is that such
implementations are written by small groups that are primarily
interested in a single language.  Another is that higher-level
languages may have more complicated run-time needs, which are
harder to generalize to multiple languages.
Even related languages like Scheme and Lisp have annoying
differences that make life difficult, and we'll discuss some
of them in this paper.</para>
<para>
So why bother with multiple languages, rather than concentrating
on just one?  The reason is the same as for a multi-language
traditional compiler like Gcc: Different people need or prefer
different languages - and some people need to use multiple languages.
Given that a non-trivial compiler and run-time environment is
a large undertaking, it makes sense to share some of the code.</para>
<para>
It does not follow that Kawa should support mixing multiple
languages in the same application, but
that too can be useful.  Calling a function library written in
one language from another language is often useful.
A Foreign Function Interface is traditionally used to enable
calling functions written in a lower-level language like C,
but calling functions written in another high-level language is
also sometime useful.
One may also want to glue together modules written by different
groups that use different languages.</para>
<para>
Another case is migrating from one language to another.
Most of the Emacs editor is written in Emacs Lisp.
The FSF has a long-term goal of replacing Emacs Lisp by Scheme.
(Personally, I think a Common Lisp subset might make more sense,
as Emacs Lisp is closer to Common Lisp, and there exists packages
that add more Common Lisp functionality.)  That means there will be
a need for mixing Emacs Lisp with Scheme and/or Common Lisp.
(The FSF plan is to compile Emacs Lisp to Scheme but that's just
an implementation detail.)
</para>
<para>
Kawa uses a <classname>Language</classname> class that contains
various language hooks.  For example the read-eval-print loop and the
compiler are non-language-specific, but they call methods of
the current <classname>Language</classname> instance to perform
language-specific actions, such as parsing a source line or file.</para>
</sect1>

<sect1 id="execution"><title>Execution and compilation</title>
<para>
Kawa is compiler-based, for good performance.
However, for dynamic languages such as Lisp, it is also important
to provide responsive implementations of <function>eval</function>
and <function>load</function>.
Kawa implements both an immediate-execution mode (which uses a
combination of interpretation and compilation, depending on the input form),
and a<quote>batch-compilation</quote> mode (where a module is compiled for
future use).
</para>
<para>
Kawa processes a form or module with these steps:</para>
<itemizedlist>
<listitem><para>
The source form is read, creating an S-expression in the usual way.
Pairs contain a line/column-number annotation, so we can include source
locations in messages, stack traces, and symbol tables.</para></listitem>
<listitem><para>
The input forms are rewritten to Kawa's internal format,
which is a nested tree of <classname>Expression</classname> objects.
Rewriting includes resolution of lexical names and macro expansion.
Kawa supports both hygienic and non-hygienic macros.</para></listitem>
<listitem><para>
Some tree-walking passes gather information, perform optimizations,
and select lambda representation.</para></listitem>
<listitem><para>
In immediate mode, if the form is simple enough, we now evaluate it
to yield the result value.</para></listitem>
<listitem><para>
Otherwise, Kawa performs code generation.  The top-level
<classname>Expression</classname> (specifically an instance of a
<classname>ModuleExp</classname>) is expanded to yield one or more Java
classes including byte-code instructions.</para></listitem>
<listitem><para>
In immediate mode, we immediately load the generated classes to create
<quote>live</quote> classes in the current Java run-time environment,
using Java's <classname>ClassLoader</classname> mechanism.
We invoke the <literal>run</literal> method on an instance of the new class.
In batch-compile mode the generated classes are written out as files
that can be loaded later.</para></listitem>
</itemizedlist>
</sect1>

<sect1 id="values"><title>Values and Objects</title>
<para>
Java is a hybrid class-based object-oriented language.
It has <quote>unboxed</quote>
(non-heap-allocated) values, such as 32-bit signed integers.
However, unboxed values have to be declared at compile-time 
using <quote>primitive</quote> types.
Otherwise, all values are heap-allocated objects that are
instances of some class or an array type.  All classes and array types
inherit from the root <classname>Object</classname> class.</para>
<para>
Thus an important task in implementing a Lisp using Java is deciding on
how the various Lisp values are represented as Java objects.
The following sections discuss how Kawa does so.
Kawa can use a standard Java class when that provides
functionality close enough to that needed for a Lisp type.
Otherwise, Kawa uses its own classes.  Most of these are not Lisp-specific,
but can be used by any Kawa language, or directly from Java.</para>
</sect1>

<sect1 id="threads"><title>Threads and environment</title>
<para>
Kawa implements <firstterm>futures</firstterm>, which originated
in MultiLisp <xref linkend="MultiLisp"/>.
A future is implemented as a Java thread.
Dynamic bindings in a future are shared with those in the parent thread,
but within a <literal>fluid-let</literal> we get fresh bindings.
Common Lisp stores the value of a dynamic variable in the
<quote>value cell</quote> of a symbol.
However, the value binding needs to be per-thread, so Kawa
symbols don't use a value cell.  Instead, the symbol is conceptually used as
a key into the current thread's environment.
The actual implementation does the name lookup at class loading time,
allocating a <classname>Location</classname> object, and then
using the Java thread-local mechanism to get the current value.
</para>
</sect1>

<sect1 id="symbols"><title>Symbols and Environments</title>
<para>
Scheme's symbols are simple:  All symbols are interned, and
there is only a single unnamed package.  Furthermore,
there is only a single value binding, rather than separate
value and function bindings, and there are no property list cells.
Emacs Lisp has only a single package, but symbols have separate
value and function bindings, as well a properties.</para>
<para>
Kawa supports multiple packages or namespaces.
As in Common Lisp, a package is a mapping from a print name to
a symbol object. Kawa doesn't yet support the full Common Lisp package
functionality, but it implements basic package <quote>inheritance</quote>.
Kawa symbols are stateless, with just a print name
and a pointer to their home package.</para>
<para>
As mentioned above, the <quote>value</quote> of a symbol isn't stored in
the symbol itself,
but it is found indirectly in the current environment, which allows multiple
concurrent interpreters, and thread-local bindings.
An environment is a two-dimensional mapping that maps
a symbol and an arbitrary property object to a location, which
may have thread-local bindings (adding a third dimension).
To get the value binding of a symbol, look it up in the current
environment, using null for the property.
To get the function binding of a symbol, use instead for the property
the uninterned <literal>FUNCTION</literal> symbol.</para>
<para>
Property lists can be accessed via a special <literal>PLIST</literal>
property. Alternatively, you can use the property directly, for
constant-time access.  Combining Common Lisp semantics with
constant-time property-list access is a little tricky, but doable.
(Early Lisp implementations stored the value and function binding
using special properties on the property list, and that's
essentially what Kawa does, too - except it uses hashing.)</para>
<para>
Common Lisp function names are normally symbols, but can be of the
special form: <literal>(setf <replaceable>name</replaceable>)</literal>.
These are easily handled in Kawa by using a special <literal>SETTER</literal>
property.</para>
</sect1>

<sect1 id="sequences"><title>Sequences and Arrays</title>
<para>
Kawa includes a set of Java classes that implement
sequences and arrays.  The class hierarchy
is compatible with Common Lisp's type hierarchy.</para>
<para>
Kawa's generalizes Common Lisp's arrays:
An array is an affine mapping onto a sequence, typically a vector.
The affine mapping is a linear combination
of the array indexes plus a displacement; this generalizes
Common Lisp displaced arrays.
The vector can be of primitive type, which gives us multiple-dimension
arrays of primitive type.</para>
</sect1>

<sect1 id="nil"><title>Nil</title>
<para>In Scheme the empty list, the symbol <literal>nil</literal>,
and the Boolean false value (<literal>#f</literal>) are 3 distinct objects.
Common Lisp and Emacs Lisp require that these all be the same object.
We don't want to convert lists from one language representation to
another when calling across languages, which dictates that we
use Scheme's empty list value for <literal>nil</literal>.
This means Boolean false differs between the languages, and so
<literal>(if x y z)</literal> in Scheme compares <literal>x</literal> against
the <literal>Boolean.FALSE</literal> object, and in Lisp
it compares <literal>x</literal> against the empty list object.  Similarly,
the <literal>nil</literal> symbol in the <literal>common-lisp</literal>
package is a special case: It's represented by the empty list object,
rather than an instance of the <literal>Symbol</literal> class.</para>
</sect1>

<sect1 id="streams"><title>Streams</title>
<para>
Input and output streams are implemented using Kawa classes
that extends the standard Java <classname>Reader</classname>
and <classname>PrintWriter</classname> classes.
(Scheme uses the term <firstterm>port</firstterm>
where Common Lisp uses <firstterm>stream</firstterm>.)
Input streams are implemented using a Kawa class
<classname>InPort</classname> that extends the standard
Java <classname>Reader</classname> class.
Output streams are implemented using a class
<classname>OutPort</classname> that extends the standard
Java <classname>PrintWriter</classname> class.
Common Lisp bidirectional streams aren't currently supported,
but would be trivial to add, as they're just a pairing of
an input stream and an output stream.
An Kawa interactive stream is an input stream that may be tied to
an output stream (that is flushed before input), and may have a
prompt procedure (whose result is printed at the start of a new line).
A Common Lisp interactive stream is slightly different:
a bidirectional stream that
wraps the input stream and its tied output stream.
</para>
<para>
A <firstterm>consumer</firstterm> is a generalized output stream
interface to write arbitrary values, not just characters.
Output streams implement the consumer interface by formatting
non-character objects.  In addition, various other data structures
also implement the consumer interface, which is used for a number
of purposes, including <quote>writing</quote> multiple value results.</para>

<sect2 id="reader"><title>Readers and read tables</title>
<para>
Kawa's Scheme/Lisp reader follows the Common Lisp specification,
including using a programmable read-table.
</para>
</sect2>

<sect2 id="printing"><title>Printing</title>
<para>Kawa includes a fairly complete implementation of
<literal>format</literal> (written in Java).  It also includes
the pretty-printer from SBCL, translated into Java.
(The re-implementation uses arrays rather than lists, and so
should be a bit more efficient.)
The Lisp programming interface, including the tables for pretty-printing
Lisp source code, is mostly missing, but the low-level functionality works
quite well.  Cycle detection is not implemented yet.</para>
</sect2>
</sect1>

<sect1 id="multiple-values"><title>Multiple values</title>
<para>
Expressions in both Scheme and Common Lisp can return
<quote>multiple values</quote>.  A big difference is that in
Common Lisp multiple values can be coerced to a single value.
XQuery expressions evaluate to sequences, which is in some ways
are similar to multiple values, in that a sequence consisting of a single item
is the same as that item.
A major difference is that XQuery sequence can be concatenated and can become
arbitrarily large, while Lisp expressions can only return a small number
of values, explicitly enumerated in the program.
Kawa represents XQuery sequences
and Lisp multiple values the same way.</para>
<para>
Kawa uses two basic representations for multiple values:
An explicit representation stores the value in a data structure.
The data structure is usually pre-allocated in a per-thread object,
reducing the need for memory allocation.
Multiple values can also be passed implicitly, as a stream of values.
In this model the results of an expression are passed to the current
<classname>Consumer</classname> as they are generated.
Output streams implement the <classname>Consumer</classname> interface,
so the values produced by top-level expression are printed as soon
as they are generated.
Such stream-based processing is very suitable for XQuery, but I
have also experimented with Lisp dialects based on this model.</para>
</sect1>

<sect1 id="types"><title>Types</title>
<para>Java has a standard <quote>meta-object protocol</quote> which
allows you to query the class of an object and its properties.
However, some Kawa languages (especially XQuery) need more extensive
type information.  Kawa has a separate <quote>type</quote> hierarchy
for this reason, and also because
a compiler needs to be able to talk about classes that don't yet exist.
Kawa has extended Scheme's syntax to allow declaring the types of
variables, parameters, and results.  In the following example,
the parameters <literal>x</literal> and <literal>y</literal>
and the result value are all native (unboxed) 32-bit integers:
</para>
<programlisting>
(define (int-max (x :: &lt;int&gt;) (y :: &lt;int&gt;))
    :: &lt;int&gt;
  (if (&gt; x y) x y))
</programlisting>
<para>This helps in generating faster code:
The above functions compiles to byte-code instructions
that operate on 32-bit unboxed Java integers.  Kawa automatically converts
arguments and results as needed.
Type specifiers are also improve compile-time error detection,
and makes it very convenient to call Java methods from Scheme.</para>
<para>
Kawa doesn't yet support the
Common Lisp declaration forms; adding those are probably the biggest
priority for Common Lisp and Emacs Lisp, since a type declaration facility
is very helpful in writing Lisp code that invokes Java features.</para>
</sect1>

<sect1 id="functions"><title>Functions</title>
<para>
Kawa uses a number of different conventions, optimizations, and tricks
for compiling function calls to Java code.
When the called function is known, Kawa may emit a direct method
invocation, or inline the function's body.
The most general mechanism assumes a function is represented by a
Java object that implements at least the following two methods:</para>
<itemizedlist>
<listitem><para>
The <literal>matchN</literal> method takes an array containing the
actual arguments.
It returns a negative error code if the arguments have the
wrong number or types.
Otherwise, the arguments are copied (possibly coerced) to a per-thread
parameter-storage-area, and <literal>matchN</literal> returns 0.
</para></listitem>
<listitem><para>
The <literal>apply</literal> method evaluates the function body,
using the parameters from the parameter save area.  The function's
result is <quote>written</quote> to a provided <classname>Consumer</classname>.
</para></listitem>
</itemizedlist>
<para>
This separation handles proper tail-calling, even though Java doesn't.
A tail-call evaluates the parameters, and calls <literal>matchN</literal>.
If that returns non-zero, an exception is thrown.  Otherwise, the function
containing the tail-call returns.  The <literal>apply</literal> method
is called later, after the calling stack frame has been popped.</para>
<para>
To call a generic function, we invoke the <literal>matchN</literal>
methods of the generic's constituent method functions.
If needed, we select the most specific matching method,
and call its <literal>apply</literal>.</para>
<para>
Kawa supports optional, keyword, and rest parameters,
in Scheme as well as Common Lisp and Emacs Lisp.
</para>
</sect1>

<sect1 id="class"><title>Defining new classes</title>
<para>
Kawa Scheme provides a <literal>define-class</literal> form which
is similar to that in Stk and Guile, which in turn are derived
from Common Lisp's <literal>defclass</literal>.
You can use it to define a Java class using Scheme syntax.
It supports multiple inheritance fairly efficiently,
by making use of Java interfaces.</para>
<para>
The <literal>define-simple-class</literal> has the same syntax
as <literal>define-class</literal>, but is restricted to single inheritance.
This allows a direct translation into a Java class, without needing to
define helper interfaces.  The result is slightly more efficient, but more
importantly make it easier to use the generated classes from Java.</para>
<para>
Both forms allow you to define methods belonging to a class, as an alternative
to Common Lisp's generic function mechanism, which can also be used.
</para>
<para>
Some features of CLOS such as <literal>change-class</literal>,
may be difficult to implement
without adding extra overhead that may be hard to justify.</para>
</sect1>

<sect1 id="exceptions"><title>Conditions; continuations</title>
<para>
Scheme's <literal>call-with-current-continuation</literal> function
can be used to perform general control transfers.  Kawa currently
only implements limited <quote>exiting</quote> continuation calls,
implemented using Java exceptions.  General continuations are planned,
but not yet implemented.  Non-local exits can be implemented using
Scheme exceptions, or in the future using continuations.
This can be used to implement Common Lisp condition handlers.
</para>
</sect1>

<sect1 id="modules"><title>Modules</title>
<para>
Kawa supports separately compiled modules.  Normally a source file
gets compiled into a <quote>module class</quote> plus sometimes
some auxiliary helper classes.  Each exported top-level definition gets compiled
to a Java field.  Importing (requiring) a module works by importing
the values bound to the fields.  Macros are also compiled into macro objects.
Macro <quote>hygiene</quote> works across modules: an exported
macro may expand to a form that references an non-exported definition.</para>
</sect1>

<sect1 id="emacs-values"><title>Emacs types</title>
<para>
The core of the Emacs Lisp language is one of many dynamically scoped
Lisp extension languages.  What makes it interesting is its embedding in
Emacs, and the special data types used by Emacs.
These include buffers, windows, frames, and key-maps.
Kawa includes basic implementations of classes for these Emacs values,
written from scratch in Java.  Actually, currently there are two
implementations of some of these classes.
The initial implementation used the standard Swing toolkit.
Recently, Christian Surlykke has contributed support for the
SWT toolkit (from the Eclipse IDE), and we've made the
JEmacs core classes platform-independent.</para>
</sect1>

<sect1 id="debugging"><title>Editing and debugging</title>
<para>
Kawa emits standard Java debug information, including line numbers
and local variable names.  Thus Java stack traces contain line numbers
referencing the Kawa input file.  It is also possible to debug Kawa
programs (at least Scheme) using an IDE like Eclipse.  The latter is
helped by an Eclipse plugin written by Dominque Boucher, which
includes a nice Scheme/Lisp editor with support for Kawa extensions.
The result is the beginnings of a Scheme debugger, but it isn't
terribly friendly yet.  One issue is that Scheme/Lisp symbol names
need to be <quote>mangled</quote> (translated) into valid Java names.
(This is unfortunately required by the Java Virtual Machine, for no
good reason I know of.)  The IDE doesn't have support for producing the
reverse mapping.  Printing Lisp values is less then ideal, though tolerable.
There is no way to input Scheme/Lisp expressions, for example in
conditional breakpoint predicates.  The IDE knows nothing about how
closures and lambdas are translated in Kawa classes, which means the
programmer has to know this instead.  Still, this is a good step towards
good Scheme/Lisp support  in one of the world's most popular IDEs.</para>
</sect1>

<sect1 id="summary"><title>Summary</title>
<para>
Kawa is a full-featured and mature environment for compiling and running
high-level languages on the Java platform.  The Scheme implementation is
the more popular and complete, but other languages are also being
implemented.
Kawa is especially convenient for efficient implementations of Lisp variants.
My time to devote on Emacs Lisp and Common Lisp has been limited;
collaborators will be very welcome.</para>
</sect1>

<bibliography>
<title>Bibliography</title>

<biblioentry id="JEmacs">
<abbrev>JEmacs</abbrev>
<authorgroup>
<author><firstname>Per</firstname> <surname>Bothner</surname></author>
</authorgroup>
<title>JEmacs - The Java/Scheme-based Emacs</title>
<bibliomisc>Free Software Magazine (original incarnation)</bibliomisc>
<pubdate>2002</pubdate>
<bibliomisc><ulink url="http://per.bothner.com/papers/JEmacs02"/></bibliomisc>
</biblioentry>

<biblioentry id="Kawa">
<abbrev>Kawa</abbrev>
<authorgroup>
<author><firstname>Per</firstname> <surname>Bothner</surname></author>
</authorgroup>
<title>Kawa: Compiling Scheme to Java</title>
<bibliomisc>Lisp Users Conference (Berkeley)</bibliomisc>
<pubdate>1998</pubdate>
<bibliomisc><ulink url="http://www.gnu.org/software/kawa"/></bibliomisc>
</biblioentry>

<biblioentry id="MultiLisp">
<abbrev>MultiLisp</abbrev>
<authorgroup>
<author><firstname>Robert</firstname> <surname>Halstead</surname></author>
</authorgroup>
<title>MultiLisp: A Language for Concurrent Symbolic Computation</title>
<bibliomisc>TOPLAS 7(4):501-538</bibliomisc>
<pubdate>1985</pubdate>
</biblioentry>

<biblioentry id="Qexo">
<abbrev>Qexo</abbrev>
<authorgroup>
<author><firstname>Per</firstname> <surname>Bothner</surname></author>
</authorgroup>
<title>Compiling XQuery to Java bytecodes</title>
<bibliomisc>First International Workshop on XQuery Implementation
Experience and Perspectives (XIME-P)</bibliomisc>
<pubdate>2004</pubdate>
<bibliomisc><ulink url="http://per.bothner.com/papers/Qexo04"/></bibliomisc>
</biblioentry>

<biblioentry id="XQuery">
<abbrev>XQuery</abbrev>
<title>XQuery 1.0: An XML Query Language</title>
<bibliomisc><ulink url="http://www.w3c.org/XML/Query"/></bibliomisc>
</biblioentry>

</bibliography>

</article>
