Updating-NodesExtending Qexo/Kawa for updates
A number of people are interesing in extending XQuery for updates. Here are some useful notes.
Updating means at least two different things:
Modifying an in-core node object, and
modifying a node in a persistent xml data database.
They're very different. Let's start with the former.
Qexo's node model
You might want to read the
gnu.lists package descriptor
for an overview of the concepts of Kawa's sequence and node objects.
A node, in the XML sense, is represented as a pair of an
AbstractSequence and an index.
The index (a position value)
is just a unique number managed by the
There are a number of implementation classes that extend
AbstractSequence, and use different ways of
managing position indexes. The one used for XML nodes
NodeTree, which is an extension of
The nodes of a document or document fragment are all in a a
each node is identified by a position index, which basically
an index in the
data array, but with
the lower-order bit used as a special flag. (See the above-mentioned
When we need to create an object for a node, we use a
KNode object. The idea is that most nodes aren't actively
referenced, so we don't need an actual
which saves a lot of space.
Updating nodes in-place
To implement updating a node object in-memory we need
to finish the update/insert/delete abilities in
The latter class is basically a gap-vector (as used in emacs and Swing),
but the data structures are more complicated because it stores a
hierarchy, rather than just characters.
Once we can update the
TreeList, we will need an extra level
of indirection. The reason is that
node identity is tied to
the position indexes, but editing a
the nodes in it move around.
The solution is to use either StableVector or something
similar. Unfortunately, StableVector doesn't currently support
TreeList. Perhaps TreeList should be changed to extend GapVector.
A more abstract way to think of it: A Node needs to be a pair of a NodeManager and an index that is managed by the NodeManager. The actual underlying storage is in a TreeList, but since indexes in a TreeList change on updates, the actual Node indexes are indexes in the NodeManager. Each time you read a property of a node, you use the node's index, which is an index in the NodeManager. You use that index in the NodeManager position array, which gives us an index in the TreeList, and get the value from the latter. To update a node, we have to similarly dereference the index in the NodeManager to get an index the the TreeLists's data array, and update the latter. That may require things to move around in the TreeList, so the indexes in the NodeManager have to be updated.
Moving nodes from one document or fragment to another is tricky. The reason is that node indexes are relative to a TreeList. One solution is to use forwarding pointers. Another is a NodeManger that can handle multiple TreeLists.
Updating XML databases
Updating a XML files or a database is more complicated. One approach is reading an XML document, updating nodes in-memory, and writing out the modified document. That is practical for modest-sized XML documents, but expensive for small changes to large documents. Another issue is that it is difficult (but not impossible) to maintain node identity between the original document and the updated version, even for nodes that are unmodified.
Ideally, one would like to modify individial nodes in-place in
the database. Thsi is doable in the Kawa node model. The basic idea
is to create a
which we might call
DatabaseDocument would be a proxy for either
the entire database, or an individual xml document. Each node has a
database key. The
DatabaseDocument object manages the
mapping between position indexes and database keys.
Note there are positions of the Qexo run-time that assume nodes are
NodeTree. They would have to be fixed
to support general
Of course once one is updating a database we also have to deal
with transactions and related