<article lang="en">
<artheader>
<title>Compiling XQuery to Java bytecodes</title>
<authorgroup>
<author>
<firstname>Per </firstname><surname>Bothner</surname>
<affiliation>
<address>
<email>per@bothner.com</email>
</address>
</affiliation>
</author>
</authorgroup>
</artheader>

 <sect1 id="Intro">
  <title>Introduction</title>
  <para>
XQuery is new language currently being standardized
by the World Wide Web Consortium (W3C).
Its application domain is querying, filtering, and
generating XML files -- or any data matching the XML infoset model.
There is a lot of industry and research interest in XQuery:
The <quote>database community</quote> is interested in XQuery as a query
language for XML databases, and the <quote>document community</quote>
is interested in querying collection of documents.
The former tend to have a relatively simple and regular structure,
while the latter have a more irregular and deeply-nested structure.
Most of the implementation effort currently appears to be
driven by existing database vendors, which want to improve
their XML offerings.
This leads to implementation strategies similar to existing
relational database implementation, such as optimizating to make
uses of indexes, creating a query plan, and result-driven
(demand pull rather than data push) execution.</para>
<para>
Qexo is an implementation that is unusual in a number of aspects:</para>
<itemizedlist>
<listitem><para>
Qexo compiles a query to a general-purpose instruction-set,
specifically Java bytecodes (which can be straightforwardly
compiled to machine code) rather than a <quote>plan</quote>
or other interpreted representation.
</para></listitem>
<listitem><para>
Execution flow more closely follow the <quote>natural</quote> program structure
rather than being driven by demand pull.</para></listitem>
<listitem><para>
Qexo is well integrated in Java, including access to arbitrary
Java objects and easily calling methods in Java's extensive class library.
</para></listitem>
<listitem><para>
Qexo supports texual XML files as well as a compact internal DOM.
We will touch on extending it to special-format or indexed databases,
but currently it is best suited for ad hoc queries of modest files,
or bulk processing
where execution speed proportional to file size is ok.</para></listitem>
<listitem><para>
Qexo is Free GNU Software (open-source), written by an individual,
rather than a company or a research group.</para></listitem>
<listitem><para>
It is based on an existing multi-language framework,
with a compiler originally written in 1996 to compile the Scheme functional
language.</para></listitem>
</itemizedlist>
</sect1>

<sect1 id="Compilation">
<title>Compiling to Bytecodes</title>
<para>
Qexo's basic structure is based on the existing Kawa
<xref linkend="Kawa"/> framework,
of which Qexo is part.
(We use <quote>Qexo</quote> to refer to the XQuery-specific support in Kawa,
while <quote>Kawa</quote> refers to the framework as a whole regardless of language.)
The Kawa project started in 1996 with compiling the Scheme functional language 
to Java bytecodes.  Over the years it has developed into a more general
framework that can compile multiple languages.  For each supported language,
you can use Kawa in multiple <quote>modes</quote>, including
interactively typing expressions at a command-line prompt,
or compiling a <quote>query</quote> in different modes.</para>
<para>
Kawa supports both a compiler and an interactive <quote>interpreter</quote>.
But the interpreter is very limited as it is only used for the most
simple expressions.
Most programs are <quote>interpreted</quote> by compiling
them to bytecode
in an internal byte array, and then a class is compiled on-the-fly
using Java <classname>ClassLoader</classname> mechanism.
This implementation supports fast interactive response
without sacrificing performance.</para>
</sect1>

<sect1 id="push-vs-pull">
<title>Lazy vs direct evaluation</title>
<para>
XQuery implementation by database people <xref linkend="BEA-XQRL03"/> tend
to be written using database techniques, where a query is compiled to a plan,
and then result of the query is generated lazily when demanded
by the application that made the query.  This has the big advantage
that you only generate the results and perform the calculation
that are needed, and some optimizations fall out by themselves.
The disadvantage is that representing the state of a computation
requires non-trivial data structures and book-keeping:
You need a special-purpose interpreter to execute the plan,
thus getting an extra layer of interpretive overhead.</para>
<para>
It is instructive to consider non-strict functional programming languages
such as Haskell <xref linkend="Haskell"/>, whose specification
require lazy demand-driven execution.
However, this is quite expensive, so optimizing implementations
perform <firstterm>strictness analysis</firstterm> <xref linkend="Wadler87"/>
to determine when it is safe to convert
the demand-driven execution to a direct <quote>eager</quote> execution.
The specification of XQuery permits either
implementation style, but experience from Haskell suggests that
direct execution will be easier to make efficient, at least when CPU use by
the query itself is a major factor.</para>
<para>
True, some queries using direct evaluation will take exponentially or
even infinitely longer than using lazy evaluation.  However, queries that
depend on lazy evaluation are not portable and should be re-written.
In contrast, I believe most queries can be implemented an order of
magnitude faster using direct rather than lazy evaluation.
This assumes that most of the execution time is executing the logic
of the query itself, rather than in library functions or reading from disk:
Lazy evaluation makes more sense if execution is likely to be I/O-bound.</para>
<para>
Best performance might need a combination of techniques.
You probably want to use lazy evaluation
for quantified <literal>some</literal> expressions,
selections using numerical predicate, and some
functions, such as <function>fn:exists</function>.
I don't know if anyone has tried such a hybrid approach.</para>
<para>
Qexo mostly uses a more direct execution model.
The big advantage of this is that the state of the computation
can be expressed using the target machine's program counter
and execution stack.
The execution state maps directly and efficiently to the (virtual)
machine program counter and stack.  No extra level of interpretation
is needed.</para>
</sect1>

<sect1 is="Streaming">
<title>Streaming</title>
<para>
Eager evaluation does not require that every value in the XQuery semantics
is realized as a object at run-time.
Qexo tries to stream sequences using an event-driven
interface like SAX.  Consider the following example.</para>
<programlisting>
for $i in (10, 20) return ($i+1, $i+2)
</programlisting>
<para>
This can be translated to:
</para>
<programlisting>
void main (Consumer output) {
  temp_1(10, output);
  temp_1(20, output);
}
void temp_1 (Object i, Consumer output) {
  output.writeItem(i + 1);
  output.writeItem(i + 2);
}
</programlisting>
<para>
Kawa's <literal>Consumer</literal> interface is an abstract <quote>data sink</quote>, which is
conceptually similar to SAX2's <classname>ContentConsumer</classname>, but
generalized to forests of general values, as needed by the XQuery data model.
</para>
<para>
This is much more efficient than a demand-driven (client pull) translation of
this query, which has to use two cursors,
one for each sequence expression, to track which value to return next.
To lazily get the first result, we would first have to
request a value from <literal>($i+1, $i+2)</literal>,
which causes a request for the first value of <literal>$i</literal>
or <literal>(10, 20)</literal>.
When the client requests the next result, we need the next value
of <literal>($i+1, $i+2)</literal>, using the same value of <literal>$i</literal>.  For the next result, there are no more values in the "inner"
sequence, so it has to request a new value for <literal>$i</literal>,
before re-evaluating <literal>($i+1, $i+2)</literal>.
The necessary bookkeeping is substantial for applications that are CPU-bound,
but it is probably well worth it if it makes it easier to minimize
disk or network accesses.</para>
</sect1>

<sect1 id="Structure">
  <title>Compiler overview</title>
<para>
Qexo compiles an XQuery module as follows:</para>
<orderedlist>
<listitem><para><emphasis>Parsing.</emphasis>
Qexo has a hand-written recursive-descent parser.
It keeps track of line and column numbers.
</para>
<para>
The result from the parser is an <classname>Expression</classname>,
which is a language-independent abstract syntax tree.
Some special XQuery forms, such as FLWOR expressions, are represented
as calls to special built-in functions.  We'll see examples later.</para>
</listitem>
<listitem><para><emphasis>QName expansion.</emphasis>
Resolving namespace prefixes to namespace URIs must be done after
parsing, because a namespace prefix can be used in an element constructor
before it is defined by a namespace attribute.</para></listitem>
<listitem><para><emphasis>Module import.</emphasis>
Importing library modules causes some complications.
At the time of writing the public XQuery drafts are inconsistent.
Until these issues are resolved it is premature
to say too much about module import.</para></listitem>
<listitem><para><emphasis>Name resolution.</emphasis>
Resolve variable references and function calls to their definitions.</para></listitem>
<listitem><para><emphasis>Analysis and optimization passes.</emphasis>
There are a number of passes that work on
the <classname>Expression</classname> tree.
Calls to certain built-in functions (such as basic arithmetic)
are re-written to more efficient forms.
We do some ad hoc type propagation.
We figure out how functions can be compiled into methods or inlined,
and how variables are assigned to virtual machine registers or fields.
New query optimization passes can be added here.
</para></listitem>
<listitem><para><emphasis>Code generation.</emphasis>
Qexo generates bytecode by recursively traversing
the abstract syntax tree.  We can generate bytecode in different
modes, depending on how it is to be used and specified options.
</para></listitem>
<listitem><para><emphasis>Output.</emphasis>
The bytecode can written out to a <literal>class</literal> file.
Alternatively, a <classname>ClassLoader</classname> can take the bytecode,
as stored in a <classname>byte</classname> array, and directly create
a <quote>live</quote> class, without writing out any files.</para>
</listitem>
</orderedlist>
<!--
<mediaobject  id="Diagram1">
  <imageobject  role="html">
    <imagedata  format="PNG"  fileref="Diagram1.png"/>
  </imageobject>
  <imageobject role="latex">
    <imagedata format="PSTRICKS"  fileref="Diagram1.tex"/>
  </imageobject>
  <imageobject  role="latex">
    <imagedata  format="EPS"  fileref="Diagram1.eps"/>
  </imageobject>
</mediaobject>
-->
</sect1>

<sect1 id="Expressions">
  <title>Expressions</title>
<para>
Qexo, like other XQuery implementation, translates XQuery surface
syntax into a simplified <quote>core XQuery</quote>.  Unlike other
implementations, the core representation is not designed for XQuery,
but uses a nested tree of language-independent <classname>Expression</classname> objects.
Kawa has a small number of sub-classes of the abstract
<classname>Expression</classname> class, including ones used for
contants, variables reference, anonymous function values, lexical
scoping blocks, and function application. 
Special XQuery forms are represented as calls to built-in functions.
For example:</para>
<programlisting>
&lt;p&gt;sum: {3+4}&lt;/p&gt;
</programlisting>
<para>
This is converted into a data structure that
has the following structure:
</para>
<programlisting>
ApplyExp[
  function: makeElement,
  args: {QuoteExp[value: "p"],
         QuoteExp[value: "sum :"],
         ApplyExp[
           function: +,
           args: {QuoteExp[value: 3],
                  QuoteExp[value: 4]}]}]
</programlisting>
<para>
An <classname>ApplyExp</classname> is used for procedure application.
Its <literal>function</literal> property specifies the procedure to call;
and its <literal>args</literal> property is an array of parameter expressions.
A <classname>QuoteExp</classname> wraps a literal Java object,
and turns it into a constant expression that always evaluates to that object.
The <function>makeElement</function> function takes an element tag, followed
by zero or more attribute expressions, followed by zero or
more expressions for the children.
</para>
<para>
More complex control structures may have sub-expressions that need
to be evaluated out of order.  We handle these by wrapping them
in an anonymous function, represented by a <classname>LambdaExp</classname>.
Consider for example:</para>
<programlisting>
for $i in (2, 3) return $i+10
</programlisting>
This is represented by:
<programlisting>
ApplyExp[
  function: valuesMap,
  args: {
    LambdaExp
      params: {$i},
      body: ApplyExp[
        function: +,
        args: {ReferenceExp[$i], QuoteExp[value: 10]}],
    ApplyExp[
      function: appendValues,
      args: {QuoteExp[2], QuoteExp[3]}]}]
</programlisting>
<para>
The built-in <function>valuesMap</function> takes two arguments:
a function, and a sequence.  It applies the function to each
element of the sequences, returning the sequence of the concatenated results.
A simple evaluation of this expressions yields the correct result, but
does so inefficiently; below we will show some ways Qexo optimizes
such expressions.</para>
</sect1>

<sect1 id="Targets">
  <title>Code generation</title>
<para>
To compile an <classname>Expression</classname>,
the Qexo compiler invokes its <literal>compile</literal> method:</para>
<programlisting>
public abstract void
compile (Compilation comp, Target target);
</programlisting>
<para>
The <classname>Compilation</classname> parameter manages the state of
the current compilation, including the current method being generated.
When <literal>compile</literal> is invoked on an
<classname>Expression</classname>, it will append
to the current method bytecode instructions to evaluate
the <classname>Expression</classname>.
What is the best strategy for doing
so, and where to leave the result of the <classname>Expression</classname>,
may depend on the expression's context.
Kawa uses a simple and effective convention: when an outer expression
needs to compile a sub-expression, it passes to the latter's
<literal>compile</literal> method a <literal>Target</literal> object
that specifies what the sub-expression should do with its result.</para>
<para>
The default <literal>Target</literal> expects the result to be
pushed onto the JVM stack as an <classname>Object</classname> reference.
I.e. if such a
target is passed to a <literal>compile</literal> method for an expression,
that method is responsible for evaluating the expression and leaving
the result on the JVM stack, where the caller can make use of it.
If some other  <literal>Target</literal> is passed to a
<literal>compile</literal> method, then the method must send the
result to the given <classname>Target</classname>.  The easiest way to do this is to
leave the result on the JVM stack, and then call the
<literal>Target</literal>'s <literal>compileFromStack</literal> method,
which is responsible for moving the result from the JVM stack to
the desired target.  (For the default <literal>Target</literal> the
<function>compileFromStack</function> method does nothing, since its caller
has left the result where it needs to go.)
Thus a <literal>compile</literal> only needs to be able to evaluate a
result and leave it on the JVM stack; it can handle
other kinds of <literal>Target</literal>s by just calling their
<function>compileFromStack</function> method.
  However, it has the
<emphasis>option</emphasis> of inspecting the passed-in
<literal>Target</literal> if that may lead to more efficient code.</para>
<para>
For example, the simplest kind of <literal>Target</literal> is an
<literal>IgnoreTarget</literal>, which is used when an expression is
evaluated for its side-effects, but the result will be ignored.
(This isn't useful for XQuery, but it is used by Scheme and
other languages.)  The <literal>IgnoreTarget</literal>'s
<literal>compileFromStack</literal> method just pops the result
from the JVM stack and ignores it.  If an expression has no side-effects
and its <literal>compile</literal> method was passed
an <literal>IgnoreTarget</literal> it generates no code.</para>
<para>
The <literal>compileFromStack</literal> method of a 
<literal>ConditionalTarget</literal> is more interesting.  It
pops off a value, converts it
to a boolean value (in a language-dependent manner), and then jumps to
either of two labels depending on whether the value is true or false.
When Kawa compiles a conditional (<literal>if</literal>) expression,
it creates a <literal>ConditionalTarget</literal> for compiling the
test expression.  This makes it easy to optimize boolean expressions as jumps.
</para>
<para>
We'll look at <classname>ConsumerTarget</classname> and
<classname>SeriesTarget</classname> in the next sections.</para>
</sect1>

<sect1 id="FLWR">
  <title>Optimizing <literal>for</literal> expressions</title>
<para>
Much of XQuery's power comes from the <quote>FLWOR</quote> expressions, and
compiling them efficiently is a challenge.
To avoid materializing the whole <literal>for</literal> clause
sequence as an object, Qexo uses a special <classname>Target</classname>
when compiling the <literal>for</literal> expression.
In the case of a <classname>SeriesTarget</classname> the
expression is evaluated in a mode where each item in the resulting
sequence calls a given function.  In the case of a <literal>FLWOR</literal>
expression, the function is the anonymous function representing
the <literal>return</literal> clause.   Consider the earlier example:</para>
<programlisting>
for $i in (10, 20) return ($i+1, $i+2)
</programlisting>
<para>
Qexo compiles <literal>(10, 20)</literal> with a
<classname>SeriesTarget</classname> that references the anonymous
function:</para>
<programlisting>
function($i) { ($i+1, $i+2) }
</programlisting>
<para>
Compiling <literal>(10, 20)</literal> with
a <classname>SeriesTarget</classname> is a matter of compiling
first <literal>10</literal> and then <literal>20</literal> with the
same <classname>SeriesTarget</classname> and putting the bytecode
for the two pieces in sequence. Compiling <literal>10</literal>
(or any singleton expression) is then just a matter of evaluating
the value and calling the anonymous function.</para>
<para>
The <literal>return</literal> clause function is implemented using the
<quote>internal subroutine</quote> instructions that Java traditionally uses
for <literal>finally</literal> clauses.  This allows direct and efficient
access to surrounding variables, and it's an interesting use of a feature
of the JVM that is not accessible from the Java source language.</para>
<para>
Qexo currently does need to reify the sequence in the case of more
complex <literal>for</literal> clauses.  Optimizing the general case is
not yet done, but it is fairly easy to do at the cost of allocating
an inner class instance.  Consider for example:</para>
<programlisting>
for $x in f($arg) return use($x)
</programlisting>
<para>
It can be compiled to the following<!-- (where I've simplified the
API assumimg that the implicit parameter is a <classname>Consumer</classname>,
rather than a <classname>CallContext</classname> that points
to a <classname>Consumer</classname>)-->:</para>
<programlisting>
void main(Consumer out) {
  f(arg, new Consumer {
           void writeItem(Object x) {
             use(x, out);
           };
         }
}<!--
XXX
void main(CallContext ctx) {
  Consumer out = ctx.consumer;
  ctx.consumer = 
    new class {
      void writeItem(Object x) {
        use(x, ctx);
    };
  f(arg, ctx);
  ctx.consumer = out;
}-->
</programlisting>
<para>
The idea is that each time <literal>f</literal> yields an item,
it calls the <function>writeItem</function> of its passed-in
<classname>Consumer</classname>.  That happens to be the
unnamed inner class shown above, where the body of the
<function>writeItem</function> method is
the compilation of the FLWOR's <literal>return</literal> expression.
This calls the <literal>use</literal> function, passing it the
outer (original) <classname>Consumer</classname>.  This mechanism
can handle general <literal>for</literal> expressions without
materializing the <literal>for</literal> sequence, and with
little overhead.</para>
</sect1>

<sect1 id="Functions">
  <title>Compiling functions</title>
<para>
An XQuery function is compiled to a Java method whose name is generated
from the function name.  A query body is treated as a zero-argument
function which we here call <literal>main</literal>.</para>
<para>
Each XQuery formal parameter results in a corresponding
formal parameter in the generated method.  In addition there is
a compiler-generated <literal>out</literal> parameter,
which has type <classname>Consumer</classname>.
The result of the function is written to this <classname>Consumer</classname>;
hence the generated method's return type is <classname>void</classname>.</para>
<para>
Here is a simple example function:</para>
<programlisting>
declare function my-func ($delta, $x) {
  if ($delta=0)
  then $x
  else ($x+$delta, $x-delta)
}
</programlisting>
<para>
This is compiled to:</para>
<programlisting>
void myFunc(Object delta, Object x,
            Consumer out) {
  if (NumEqual(x, 0))
    out.writeItem(x);
  else {
    out.writeItem(NumAdd(x, delta));
    out.writeItem(NumSub(x, delta));
  }
}
</programlisting>
<para>
<literal>NumEqual</literal>, <literal>NumAdd</literal>,
and <literal>NumSub</literal> are static methods in the runtime
library; with appropriate type declarations Kawa can generate more
specific code or inlined arithmetic.</para>
<para>
Generating code like this is straight-forward. Kawa creates a
<classname>ConsumerTarget</classname> that contains the name
(actually virtual register number) of the <literal>out</literal> temporary,
and passes this <classname>ConsumerTarget</classname> instance
to the <literal>compile</literal> method for the function's body.
The same <classname>ConsumerTarget</classname> instance gets passed
on when compiling the conditional and sequence sub-expressions.</para>
<para>
The Qexo environment creates the initial <classname>Consumer</classname>
that it passes to the <literal>main</literal> function.
What kind of <classname>Consumer</classname> to pass depends on
how the query is invoked. By default Qexo writes out the result of a query
to the standard output stream using XQuery serialization.  To do that, it
allocates an instance of a <classname>Consumer</classname> subclass such
that methods like <function>writeItem</function> call appropriate
output functions.</para>
<para>
Compiling a function call is simple.
The actual parameters are compiled with a default
<classname>Target</classname>, leaving the result on the JVM stack.
If the target for the function call as a whole is a
<classname>ConsumerTarget</classname>, we just pass the current
<classname>CallContext</classname> and <classname>Consumer</classname>
to the method as the context parameter.
Otherwise, the compiler generates code to collect the output
from the function (which writes to a <classname>Consumer</classname>)
into a sequence object. A <classname>TreeList</classname> helper
class makes this simple and reasonably efficient.</para>
<para>
More efficient function calls can be done with global analysis,
which can cause functions to be inlined or use an optimized calling
convention.  Kawa does some of this, but the current focus is aimed
at Scheme, where nested and anonymous functions are more of a priority.</para>

<sect2 id="tail-calls">
  <title>Tail calls</title>
<para>
A <firstterm>tail-call</firstterm> is a function call that is the
last expression executed in a function body.
It is desirable to optimize tail-calls so that they can execute without
growing the call frame stack.  This allows many recursive functions
to execute on large data sets (such as long sequences) without
running out of stack space.
Unfortunately, the Java virtual machine does not optimize tail-calls,
so a directly mapping of XQuery function calls to Java methods invocation
will not optimize tail-calls.</para>
<para>
The solution is to split up a function call into three parts:</para>
<orderedlist>
<listitem><para>Evaluate the argument expressions, and leave the result in a
well-known location.  Leave a reference to the function we want to call
in another well-known location.</para></listitem>
<listitem><para>
Return from the method that implements the calling function,
which releases its stack frame.</para></listitem>
<listitem><para>
A generic driver calls the function, as specified in the second well-known
location, using the previously saved argument values.</para></listitem>
</orderedlist>
<para>
For <quote>well-known locations</quote> we could use static fields,
but that would not work for multiple threads.  Instead, we use
non-static fields of <classname>CallContext</classname> class,
and use a separate <classname>CallContext</classname> instance for each thread.
The current thread's <classname>CallContext</classname> is accessible as
a <classname>ThreadLocal</classname> variable, but for performance
we pass it along on each function call as an implicit parameter.
</para>
<para>
In previous sections we indicated that each function gets an implicit
<classname>Consumer</classname> parameter.  That is not quite right.
The implicit parameter is actually a <classname>CallContext</classname>,
which has a field pointing to the current <classname>Consumer</classname>.
So the <literal>myFunc</literal> function above actually is compiled thus:
</para>
<programlisting>
void myFunc(Object delta, Object x,
            CallContext ctx) {
    Consumer out = ctx.consumer;
    ...
  }
}
</programlisting>
</sect2>

<sect2 id="procedures">
  <title>Procedure values</title>
<para>
Qexo creates a field for each XQuery function, which contains
a <classname>Procedure</classname> object for referencing the
function as a value.
Being able to use a function as a value is essential for functional languages,
such as Scheme, but it isn't strictly needed for XQuery.
However, Qexo uses function values to implement for tail-call elimination.
discussed above.  Also, function-specific optimizations (discussed below) are
implemented using special methods of the <classname>Procedure</classname>.
</para>
</sect2>

<sect2 id="inlining">
  <title>Inlining</title>
<para>
Kawa provides two hooks that the compiler
can use to optimize or customise a function call.
When the compiler processes a function  call, it checks if the called
function is a known procedure, and if so if the procedure implements
either of <classname>CanInline</classname> or <classname>Inlineable</classname>
interfaces.</para>
<para>
If a <classname>Procedure</classname> implements the
<classname>CanInline</classname> interface, the compiler
calls its <classname>inline</classname> method at tree rewriting time,
passing in the <classname>Expression</classname> for the function call.
The <literal>inline</literal> returns a new <classname>Expression</classname>
that replaces the original.</para>
<para>
If the procedure is a pure function and the arguments are constants,
it can replace the call by the result value.  More commonly, it will
know the argument types at this point so it can replace the call by
a type-specific variant.  For example, it may replace a call to a
generic function such as addition with an invocation of a known Java method.
Another example is the <literal>invoke</literal> library function 
which takes an object expression, a string expression that names a method,
and other parameters.  The <literal>inline</literal> method of
<literal>invoke</literal> attempts to resolve it to a call to a specific
method, which can be compiled much more efficiently.</para>
<para>
The <classname>Inlineable</classname> interface is used
during code generation.  Before generating bytecode for a function call,
Kawa checks if the
called function is known and implements <classname>Inlineable</classname>,
If so, instead of generating general-purpose bytecode to evaluate the
arguments and then call the function, Kawa calls the procedure's
<literal>compile</literal>  method which then is responsible for code
generation.  This allows instruction-level customization.
For example, if the operands to the addition operator are primitive
(non-object) 32-bit integers, the <literal>compile</literal> method
can emit a single <literal>iadd</literal> instruction to add them.</para>
<para>
The <literal>compile</literal> can also do special control flow.
For example, the <classname>ValuesMap</classname> class is used to
represent a <literal>for</literal> expression.  It calls a function
(normally an anonymous known <quote>lamba expression</quote> representing
the <literal>return</literal> clause), once for each item in a sequence,
and concatenates the results. Its <literal>compile</literal> method
attempts to inline the call to an efficient loop.</para>
</sect2>
</sect1>

<sect1 id="Nodes">
<title>Node representation</title>
<para>
There are a number of ways one would represent a node in Java.
The obvious way is to use W3C DOM's standard <classname>Node</classname>
interface,
but this requires one object per node, and (unless you're quite clever)
lots of pointers.  This is expensive in terms of space, construction time,
locality, and GC traversal overhead.
Qexo represents a node as a pair consisting of an object that
extends <classname>AbstractSequence</classname>
plus a 32-bit integer <quote>position</quote>.
The integer identifies a particular node or position;
it is a magic cookie that only has meaning in the context of its owning
<classname>AbstractSequence</classname>.  Since the position is a resource
that is managed by the <classname>AbstractSequence</classname>,
there is no problem with a database containing more
than 2<superscript>32</superscript> nodes, as long as clients only need to
reference 2<superscript>32</superscript> at a time.</para>
<para>
<classname>AbstractSequence</classname> is an abstract class,
which is used for many purposes: nodes, sequences, Scheme lists and vectors.
The <classname>NodeTree</classname> sub-class is used for nodes.
It stores an entire document or document fragment in two arrays:
a character array,
and an <classname>Object</classname> array.
<quote>Pointers</quote> between nodes are relative indexes
stored in the character array using one or two 16-bit characters.
The representation uses a <quote>buffer gap</quote> which allows
efficient insertion and deletion of nodes near the gap.
This representation is very compact, easy to append to, and supports
efficient navigation (though some tuning of the basic design may be
worthwhile).  A position cookie is just an index into the character array.
This works fine for read-only nodes.  For modifiable nodes and documents
we use an indirection table:  the indexes in the indirection array are used
for the magic cookies, while the values of the array are indexes
into the <classname>NodeTree</classname>'s character array.</para>
<para>
The XMark <quote>standard</quote> 100MB test file (116 million bytes)
is read by Qexo into an array of 104 million 16-bit
Java characters, plus a 200-element object array of pointers to shared
element and attribute names.  It took a little over a minute to
read the file, on a 1GHz PowerBook with 512MB of memory.
Simple XPath selections using this representation run very quickly.
In contrast, Saxon 7.9.1 needed about twice as big a heap, and took almost
8 times as long, largely due to increased paging. (The "user" process time
was only 50% more with Saxon.)
</para>
<para>
To handle large persistent or remote databases, Qexo would need
a new class derived from <classname>AbstractSequence</classname>.
This class would handle caching
and communication with the database.  It would manage position integers which
could be datebase keys or other proxies for the actual database nodes.
That is not to imply this would be a trivial task:
There are some places in Qexo that assume <classname>NodeTree</classname>,
and they would have to be generalized.  Making use of indexes would
require teaching the Qexo optimizer about them.  Updates and transactions
will bring in a whole new set of issues.</para>
<para>
For convenience Kawa provides a set of wrapper classes that
implement the W3C DOM interfaces.  For example the class
<classname>KNode</classname> implements the
<classname>org.w3c.dom.Node</classname> interface.
This is an object that has two fields:  a reference to an
<classname>AbstractSequence</classname> container, and
a 32-bit integer position. A <classname>KNode</classname> does not carry
node identity, and can be quite transitory.  It is used when a node
needs to be represented as an <classname>Object</classname>.</para>
</sect1>

 <sect1 id="Extensions">
  <title>Extensions</title>
<para>Qexo has some non-standard extended features.
Here are some of the more interesting ones.</para>
 <sect2 id="JMethods">
  <title>Calling Java Methods</title>
<para>
Qexo (following many XSLT implementations) uses special namespaces to
name Java classes. For example:</para>
<programlisting>
declare namespace JInt = "class:java.lang.Integer";
JInt:toHexString(255)
</programlisting>
<para>
This invokes the static <function>toHexString(int)</function>
method in the Java class <classname>java.lang.Integer</classname>,
evaluating to the string <literal>"ff"</literal>.
You can also invoke non-static methods (passing the <literal>this</literal>
receiver as the first parameter), or construct new objects (using
<function>new</function> as the method name).  The compiler picks the
best matching method using the available type information.</para>
<para>
As a further convenience, you can just use a classname directly as
a prefix, assuming there is no matching in-scope namespace,
and the class is in the compile-time classpath.
For example:</para>
<programlisting>
java.lang.Object:toString(3+4)
</programlisting>
<para>
This calls the <function>toString</function> method of the
object representing 7, yielding the string <literal>"7"</literal>.</para>
</sect2>

 <sect2 id="Servlets">
  <title>Servlets</title>
<para>
Kawa has built-in suppport for automatically compiling an XQuery
query (or other Kawa-supported language) to a servlet.
A <firstterm>servlet</firstterm> is a kind of Java class that is executed in
an appropriate web server in response to HTTP requests.
The result of the query becomes the HTTP response.
Here is a trivial but valid servlet:</para>
<programlisting>
&lt;html&gt;&lt;body&gt;
  &lt;p&gt;&lt;b&gt;Hello&lt;/b&gt; world!&lt;/p&gt;
&lt;/body&gt;&lt;/html&gt;
</programlisting>
<para>Qexo has standard functions for querying the HTTP request and
setting HTTP response parameters.
There is also a helper class that automatically compiles an XQuery
source file to a servlet class whenever the source is updated.
(Kawa just caches the compiled class internally.  Because Kawa's compiler
is so fast, there is little point saving the compiled class
on disk, the way JSP does it.)
This provides a simple low-overhead way of writing
<quote>web applications</quote>.</para>
</sect2>
<sect2 id="Interactive">
<title>Interactive console</title>
<para>
You can interactively type commands at a <quote>console</quote>:</para>
<programlisting>
(: 1 :) declare variable $ten { 10 };
(: 2 :) declare function scale ($x) { $ten * $x };
(: 3 :) scale(3)
30
(: 4 :) declare variable $hundred { scale($ten) }
(: 5 :) scale($hundred)
1000
</programlisting>
<para>
The command prompt includes line numbers, to help with error messages,
and is in the form of a comment, to aid cut-and-paste.
Declarations add names to the console state, while expressions
are evaluated and their result printed.
Semi-colons after declarations are optional.
</para>
</sect2>
</sect1>

 <sect1 id="Status">
  <title>Status</title>
<para>
Qexo implements most but not all of the November 2003 XQuery draft.
Many of the standard functions are missing, as is support for the
<literal>order by</literal> clauses, and schema validation.
There is poor or missing support for some of the standard data types.
I hope to add these in the next few months.
There is some ad hoc and incomplete static typing.
A full implementation of static typing will be added later.
</para>
<para>
For more information, including download and usage instructions,
see the <ulink url="http://www.qexo.org">Qexo web site</ulink>,
or the <ulink url="http://www.gnu.org/software/kawa">general Kawa site</ulink>,
or email the author and implementor
at <email>per@bothner.com</email>.</para>
<para>
The Kawa compiler and Scheme runtime has for years been
successfully used in both research and production environments,
by a small but enthusiastic user group.  Early XQuery adopters are
doing the same with Qexo.
The goal of Qexo is an efficient, self-contained, and complete XQuery
implementation, which because of its open-source nature can be
tailored to different needs and environments.</para>
</sect1>

<bibliography>
<title>Bibliography</title>

<biblioentry id="Haskell">
<abbrev>Haskell</abbrev>
<authorgroup>
<editor><firstname>Simon</firstname> <surname>Peyton Jones (ed)</surname></editor>
</authorgroup>
<title>Haskell 98 Language and Libraries</title>
<subtitle>The Revised Report</subtitle>
<publisher><publishername>Cambridge University Press</publishername></publisher>
<pubdate>2003</pubdate>
<bibliomisc>See also the <ulink url="http://www.haskell.org">Haskell web site</ulink></bibliomisc>
</biblioentry>

<biblioentry id="Kawa">
<abbrev>Kawa98</abbrev>
<authorgroup>
<author><firstname>Per</firstname> <surname>Bothner</surname></author>
</authorgroup>
<title>Kawa: Compiling Scheme to Java</title>
<bibliomisc>Lisp Users Conference (Berkeley)</bibliomisc>
<pubdate>1998</pubdate>
</biblioentry>

<biblioentry id="BEA-XQRL03">
<abbrev>BEA03</abbrev>
<authorgroup>
<author><surname>Florescu et al (BEA)</surname></author>
</authorgroup>
<title>The BEA/XQRL Streaming XQuery Processor</title>
<bibliomisc>Proceedings of the 29th VLDB Conference</bibliomisc>
<pubdate>2003</pubdate>
</biblioentry>

<biblioentry id="Wadler87">
<abbrev>Wadler87</abbrev>
<authorgroup>
<author><firstname>Philip</firstname> <surname>Wadler</surname></author>
<author><firstname>R.J.M.</firstname> <surname>Hughes</surname></author>
</authorgroup>
<title>Projections for Strictness Analysis</title>
<bibliomisc>In <emphasis>Functional programming languages and computer architecture</emphasis></bibliomisc>
<publisher><publishername>Springer-Verlag</publishername></publisher>
<pubdate>1987</pubdate>
</biblioentry>

</bibliography>

</article>
