JEmacs - the Java/Scheme-based Emacs
Per Bothner
per@bothner.com
here is the abstract
Introduction
One the FSF's long-term goals for Emacs is to replace the
extension/scripting language "Emacs Lisp" (ELisp) with Scheme,
while also providing a translator to convert old ELisp
files to Scheme. One reason is that ELisp is an ad-hoc,
non-standard Lisp variant not used anywhere else, and
which is not consistent with modern programming-language ideas.
Another reason is that Guile, the primary GNU dialect of
Scheme, is intended to be the standard extension language
for GNU software, and so it makes sense for Emacs (the
main GNU application with extensive use of a scripting language)
should follow suit.
Kawa is an implemention of Scheme, which is implemented
on top of Java. This is not unique, but what makes Kawa unique
is that Scheme is compiled to Java bytecodes, with non-trivial
optimizations. It also provides almost all the other features
you expect from a production Scheme system (including eval and
load) and convenient interaction between Scheme and Java.
((Kawa was the subject of a Freenix 98 presentation.))
Kawa is also a GNU application: Guile is best for scripting
of applications mainly written in C or C++, while Kawa is ideal
for scripting Java applications.
have been working on a "next-generation" Emacs, based on Kawa.
The design includes:
An implementation of the Scheme language.
An implementation of the ELisp (core) language, such as
functions for creating lists and strings, defining functions,
and macros. (Barely started as of 99/11.)
A set of Java classes, based on the Swing GUI api, that
implement the Emacs "types", such as Buffer, Keymap, Window, Marker.
A set of Scheme bindings to the Java methods.
These are "similar to" and have the same names as standard Emacs Lisp
functions, but are written in Scheme and intended to be called from Scheme.
The equivalent ELisp functions: Implementations of the high-level
Emacs functions as ELisp functions, so existing ELisp applications
can (mostly) run without change.
The totality of these features is what I call "JEmacs".
JEmacs in action
Motivation
This is a major, perhaps foolhardy, undertaking. Here are
some reasons why it might make sense. I expand on these below.
Swing is a modern GUI toolkit with good support for major Emacs
concepts.
Building on a Java run-time means we benefit from the work
being done to run Java (bytecodes) fast.
Java is multi-threaded.
Kawa is a modern object-oriented Scheme, while Emacs is based
on rather old design ideas.
Java is based on Unicode and has good internationalization
support.
Java has lots of neat packages we can use.
It would be useful to have Scheme (and ELisp) scripting
for Swing applications.
It is a good way to learn Swing!
Symbols
The symbol data type in Scheme is very simple:
It is an immutable atomic string; you can create
a symbol from a (mutable) string, and you can convert the
symbol back to a string (for example for printing).
The only non-trivial feature is that whenever you convert
a string to a symbol, you will always get the same identical
symbol, as long the strings have the same characters.
This process is called interning and is
implemented using a global hash-table.
Symbols are used for multiple purposes, but the most important
one is that identifiers in a Scheme program are (importally) represented
using symbols.
Java has a similar datatype, the class String,
which is used all over the place in Java. Java has a method,
called intern, which returns an interned version
of the String. This functionality is exactly what
is needed for Scheme, so Kawa uses String
for Scheme strings. This has the side benefit of increasing
interoperabilty between Scheme and Java.
ELisp symbols are more complex: In addition to the print name,
they have a value cell, a function cell, and a property list.
The Swing Toolkit
Swing is the "next-generation" GUI toolkit for Java. It has a
lot of functionality and many useful features. Of particular
interest is that the "text" support in Swing is both very
powerful, and also seems to be inspired by Emacs ideas.
Swing has a separation between a "Document" versus a "View"
on the document, which is essentially the same as the Emacs
Buffer vs Window distinction. Swing also provides support
for features like keymaps and markers. (Unfortunately, the semantics
are often similar to Emacs, but not quite right, so we need to
work around these problems.)
Swing has some other nice features, such as "pluggable-look-and-feel"
(themeability), a number flexible "wdigets", and support for
"structured" buffers (i.e. XML/HTML structure in a Buffer).
One problem with Swing is that while it is portable and freely
redistributable, it does not have a free license, and there are so far
no free re-implementations. (A related issue is that the
documentation of Swing sucks.) I'm hoping that by the time JEmacs
becomes useable, that a free re-implementation of (the needed subset
of) Swing will be available, and perhaps JEmacs will encourage that to
happen. If not, we may re-write JEmacs so it can be built on top of
some other free library (such as Gtk/Gnome or Qt).
Performance
A primary advantage of JEmacs is that Kawa is potentially much faster
than either ELisp or Guile. Using an optimizing compiler that
compiles to bytecode is certainly going to be faster than Guile
or Emacs's simple interpreter. The Emacs bytecode-compiler is the same
idea, and produces a bytecode format that is more suitable to Emacs than
Java bytecodes. However, there are many projects and companies
working very hard on running Java bytecodes fast. The common
approach is to use a "Just-in-Time compiler" (JIT), which dynamically
compiles a bytecode method into native code *inside* the runtime.
Another approach is to use a traditional "ahead-of-time" compiler
(such as the Gcc-base Gcj).
Multi-threading
One problem with traditional Emacs is that it is single-threaded. If
you start some non-trivial operation (such as getting new mail), your
Emacs session will be unusable until the operation completes. Java is
designed to be multi-threaded, so it is straight-forward to create a
multi-threaded Emacs. A complication is that the Emacs Lisp execution
model is inherently single threaded (any ELisp operation can switch to
another buffer or window). One solution is to give each buffer its
own thread. That does not work, because some applications (such as
mail readers) manage multiple buffers. Instead, we can give each
*buffer group* its own thread. When a new buffer is created, by
default it would get put in the same buffer group as the current
buffer. Another solution is to have each command use a new "worker"
thread, and use synchronization to serialize buffer access. (The
current implementation only uses a single thread.)
Java classes for Emacs
Among the Java classes implemented are the following:
Buffer: An Emacs buffer.
Contains a Swing StyledDocument object that
manages the actual text (and styles).
Contains a BufferKeymap, which manages the actions
executed for different keystrokes.
BufferKeymap: A data structure in one-to-one
association with a Buffer.
It implements the Swing Keymap interface, and
manages the primitive Keymaps, to give the correct
Emacs functionality.
Window:
Extends the Swing JtextPane class.
Includes an assocated
Modeline, and a scrollbar.
Frame: A
top-level window. A Frame contains a nested hierarchy
of Windows,
sub-divided using Swing's JSplitPane.
Marker:
A position in a buffer that gets adjusted as needed.
Similar to the Swing Position class,
but also knows the Buffer it points
to.
Content:
The actual characters of the Buffer.
This class is needed because standard Swing does not
support the Marker semantics we need.
Action
Swing uses Keymaps to map a KeyStroke to an Action, which is what gets
executed. Emacs is similar, except looking up something in a keymap
yields a "keymap entry", of which there are many kinds. So what
JEmacs does is to "wrap" the Emacs-style keymap entries using special
subclasses of Action. For example, looking up a prefix key in Emacs
returns another keymap; in JEmacs it returns a PrefixAction.
Performing the PrefixAction modifies the current BufferKeymap state so
that when the next keystroke appears it will lookup a "key sequence"
that is the concatenation of the remembered prefix key(s) and the new
keystroke. One slight difference from standard Emacs: JEmacs
remembers previous prefix keys on a per-Buffer basis, so if you switch
to a different buffer with the mouse, the old prefix key is remembered
until you switch back.
Some issues in implementing ELisp
There are some tricky issues if you want to translate ELisp
into Scheme, as the plan for Guile/Emacs. JEmacs has
a different plan: To translate ELisp into Java bytecodes, using
the existing Kawa framework. That avoids some of the tricky
issues, but many of the remain, assuming we want to have clean
interoperability between Scheme code and ELisp code.
JEmacs in action
Symbols
The first problem is that the semantics of lists and symbols differ
between ELisp and Scheme. A Scheme symbol is a simple immutable
atom; Kawa represents them using the Java String class. An ELisp
symbol is a more complex mutable data structure (with value and function
slots, and a property list). JEmacs will also represent ELisp
symbols as immutable String objects; the value, function and property
list slots are accessed by looking up the String in a hashtable.
This retrieves a Binding object, which contains the extra slots.
This extra hashtable lookup is slower than just extracting the
slots from a Symbol object; however for statically named symbols
Kawa does the String->Binding mapping at load time, and caches
the Binding. It does the same thing for Scheme, so we preserve
compatibility and efficiency.
Nil - the empty list
In ELisp, the empty list and the symbol 'nil are the same object, but
in Scheme theye ar different. There are various ways to deal with the
problem, none particularly nice. In Kawa, lists inherit from the
abstract Sequence class. I feel it is important that the empty list
also be a Sequence, even for ELisp, and it is important to be able to
pass lists between Scheme and ELisp code. I decided that 'nil would be
represented as an instance of List, differently from all all other
symbols. Thus the predicate (symbolp X) is implemented as
(X == List.Empty || x instanceof java.java.String).
Variables
Variable lookup is different in Scheme and ELisp in two main ways:
ELisp uses dynamic scoping, while Scheme uses lexical scoping; and
ELisp has different namespaces for function names and variables
names, while Scheme has a single namespace for both. The latter is
an easy matter of the compiler emitting the code to look for the
name in the correct namespace. Handling dynamic scoping is more
tricky, but Kawa has already implemented the necessary framework
(the fluid-let form provides dynamic binding, using a very flexible
name-binding mechanism).
Builtins
ELisp has many builtin functions and macros which are different
from Scheme. There is no fundamental difficulty with this; just
a lot of porting/conversion work.
Unicode and Internationalization
Java uses 16-bit Unicode characters.
JEmacs in action
Status
Lots of stuff works alread.