Drafts http://per.bothner.com/drafts/ Home Comments http://per.bothner.com/Languages/Comments/ http://per.bothner.com/Languages/Comments/ draft kawa Wed, 14 Mar 2007 17:36:50 -0700 2007-03-15T00:36:50Z <p> A program language should have (at least) have these two kinds of comments:</p> <ul> <li>Comment extends to the end of the line.</li> <li>Comment extends to a end comment delimiter. Such comments should <em>nest</em>, unlike the Java <code>/* ... */</code> comments.</li> </ul> <p> An interesting option for nestable comments is for the start delimiter to be <code>#!</code>. The end delimiter could be <code>!#</code>. This allows: </p> <pre> #!/bin/sh exec kawa --options "$0" "$@" !# (define ....) </pre> Patterns http://per.bothner.com/Languages/Patterns/ http://per.bothner.com/Languages/Patterns/ draft Wed, 14 Mar 2007 17:36:18 -0700 2007-03-15T00:36:18Z <p> A <dfn>pattern</dfn> can matched against a value. If it matches, one or more variables may be <dfn>bound</dfn> to some part of the matched value. <p> Patterns can be used in various declaration contexts, include variable declaration, parameter declarations, and cases of a switch expression. <h2>Abstract patern grammar</h2> <p> Here is a classifications of patterns why should support. The concrete syntax is not fixed. <h3>Variables</h3> <p> The simplest pattern is a variable. This declares that variable, and it is bound to the value being matched against. Question: It may make sense to use a special syntactic marker to indicate a variable being declared, as opposed to being used. <h3>Type specification</h3> <pre> <var>pattern</var>!<var>type</var> </pre> <h3>Conjunction</h3> <pre> <var>pattern1</var>&<var>pattern2</var>... </pre> This matches <var>pattern1</var> against the target, possibly binding some variables. Then <var>pattern2</var> is matches against the same target. The <var>pattern2</var> may contain use variables bound in <var>pattern1</var>. Commonly, <var>pattern2</var> will be a predicate or a type-specifier. In fact, perhaps having a special syntax for conjunction may not be useful, since it can be expressed using a predicate. <h3>Predicate</h3> <pre> {<var>boolean-expression</var>} </pre> This matches if the <var>boolean-expression</var> evaluates to true. Typically, <var>boolean-expression</var> may contain variables declared previously in a conjunction. In fact, we could combine the syntaxes: <pre> {<var>pattern</var>|<var>boolean-expression</var>} </pre> <h3>Constructor</h3> <pre> <var>constructor-name</var>(<var>pattern1</var>, <var>pattern2</var>, ...) </pre> Shell interface http://per.bothner.com/Kawa/Shell/ http://per.bothner.com/Kawa/Shell/ draft kawa shell Sat, 03 Feb 2007 11:49:08 -0800 2007-02-03T19:49:08Z <p>Up: <a href="http://per.bothner.com/drafts/./../Kawa/">Kawa</a></p> <h2>Running commands</h2> <pre> (run command <var>arg</var> ...) </pre> The <code>command</code> is an executable program or script, and the <code><var>arg</var></code> are command line arguments. (For now leave it open if these evaluated or quoted.) <p> The result of standard output of the command is effectively redirected to a temporary file, and the contents of this file, viewed as a string or text object, becomes the result of the <code>run</code> expression. If the output consumer for the <code>run</code> is an output port then the command's standard output is re-directed to that port. In the initial case, the output consumer is the standard output stream of the containing JVM, so no redirection is needed. <p> The standard error output of the <code>command</code> is piped to the current error port. If the error port matches the initial error port, no re-direction is needed. <p> The standard input of the <code>command</code> is connected to the current input port of the dynamic context. <p>Discussion: An alternative is to define <code>run</code> so the output from the comm and is written to the current output port. One could then re-ify the output from a command with some kind of a <code>with-output-to-string</code> macro. <h2>File name expansion</h2> <pre> (glob regexp) </pre> <p> Return a set of Path values that match the regexp, as multiple values. The can be interpolated in a <code>run</code> argument list. Updating-Nodes http://per.bothner.com/Kawa/Updating-Nodes/ http://per.bothner.com/Kawa/Updating-Nodes/ draft kawa xml Sat, 03 Feb 2007 10:26:49 -0800 2007-02-03T18:26:49Z Extending Qexo/Kawa for updates <p> A number of people are interesing in extending XQuery for updates. Here are some useful notes.</p> <p> <q>Updating</q> means at least two different things: Modifying an in-core node object, and modifying a node in a persistent xml data database. They're very different. Let's start with the former.</p> <h3>Qexo's node model</h3> <p> You might want to read the <a href="http://www.gnu.org/software/kawa/api/gnu/lists/package-summary.html#package_description"><code>gnu.lists</code> package descriptor</a> for an overview of the concepts of Kawa's sequence and node objects. A node, in the XML sense, is represented as a pair of an <code>AbstractSequence</code> and an index. The index (a <dfn>position value</dfn>) is just a unique number managed by the <code>AbstractSequence</code>. There are a number of implementation classes that extend <code>AbstractSequence</code>, and use different ways of managing position indexes. The one used for XML nodes is a <code>NodeTree</code>, which is an extension of <code>TreeList</code>. The nodes of a document or document fragment are all in a a single <code>NodeTree</code>; each node is identified by a position index, which basically an index in the <code>TreeList</code>'s <code>data</code> array, but with the lower-order bit used as a special flag. (See the above-mentioned descriptor in <code>gnu.list</code>.) When we need to create an object for a node, we use a <code>KNode</code> object. The idea is that most nodes aren't actively referenced, so we don't need an actual <code>KNode</code> object, which saves a lot of space. </p> <h3>Updating nodes in-place</h3> <p>To implement updating a node object in-memory we need to finish the update/insert/delete abilities in <code>gnu.lists.TreeList</code>. The latter class is basically a gap-vector (as used in emacs and Swing), but the data structures are more complicated because it stores a hierarchy, rather than just characters. Once we can update the <code>TreeList</code>, we will need an extra level of indirection. The reason is that <q>node identity</q> is tied to the position indexes, but editing a <code>NodeTree</code> causes the nodes in it move around. The solution is to use either StableVector or something similar. Unfortunately, StableVector doesn't currently support TreeList. Perhaps TreeList should be changed to extend GapVector.</p> <p> A more abstract way to think of it: A Node needs to be a pair of a NodeManager and an index that is managed by the NodeManager. The actual underlying storage is in a TreeList, but since indexes in a TreeList change on updates, the actual Node indexes are indexes in the NodeManager. Each time you read a property of a node, you use the node's index, which is an index in the NodeManager. You use that index in the NodeManager position array, which gives us an index in the TreeList, and get the value from the latter. To update a node, we have to similarly dereference the index in the NodeManager to get an index the the TreeLists's data array, and update the latter. That may require things to move around in the TreeList, so the indexes in the NodeManager have to be updated.</p> <p> Moving nodes from one document or fragment to another is tricky. The reason is that node indexes are relative to a TreeList. One solution is to use forwarding pointers. Another is a NodeManger that can handle multiple TreeLists.</p> <h3>Updating XML databases</h3> <p>Updating a XML files or a database is more complicated. One approach is reading an XML document, updating nodes in-memory, and writing out the modified document. That is practical for modest-sized XML documents, but expensive for small changes to large documents. Another issue is that it is difficult (but not impossible) to maintain node identity between the original document and the updated version, even for nodes that are unmodified. <p> Ideally, one would like to modify individial nodes in-place in the database. Thsi is doable in the Kawa node model. The basic idea is to create a <code>AbstractSequence</code> sub-class, which we might call <code>DatabaseDocument</code>. The <code>DatabaseDocument</code> would be a proxy for either the entire database, or an individual xml document. Each node has a database key. The <code>DatabaseDocument</code> object manages the mapping between position indexes and database keys. <p> Note there are positions of the Qexo run-time that assume nodes are implemented using <code>NodeTree</code>. They would have to be fixed to support general <code>AbstractSequence</code>s. <p> Of course once one is updating a database we also have to deal with transactions and related <q>ACID</q> issues. Character escapes http://per.bothner.com/Languages/CharacterEscapes/ http://per.bothner.com/Languages/CharacterEscapes/ draft kawa Sat, 03 Feb 2007 10:21:42 -0800 2007-02-03T18:21:42Z <p> Use the standard <q><code>\</code></q> to escape special characters, in both string literals, and outside. In general (outside string literals) a <q><code>\</code></q> followed by a non-letter character makes that character be treated as a letter. E.g. <code>\1\+2</code> is a 3-character identifiers consisting of the characters <code>1</code>, <code>+</code>, and <code>2</code>, even if the languages normally otherwise doesn't allow identifiers to start with digits or to contain <code>+</code>.</p> <p> Letters don't need to be escaped, in either identifiers or names. So we're free to use <q><code>\</code></q> followed by a letter for other purposes, including the standard C string escapes. I suggested at least the following:</p> <p> <code>\xNNNN</code> - A Unicode escape. Terminated by the first character that is neither a digit or a letter. If that character is a space, it is ignored. Only a single space is ignored.</p> <p> <code>\n</code> - A newline.</p> <p>...</p> <p> The string form of regular expressions should be compatible with this convention.</p> Kawa http://per.bothner.com/Kawa/ http://per.bothner.com/Kawa/ draft kawa Sat, 03 Feb 2007 10:15:05 -0800 2007-02-03T18:15:05Z Future Kawa link overview page ... For example sub-page <a href="http://per.bothner.com/drafts/./../Kawa/Shell/">Shell</a>. Or <a href="http://per.bothner.com/drafts/./../Kawa/Shell/">Shell</a>. UI http://per.bothner.com/UI/ http://per.bothner.com/UI/ UI draft Sat, 03 Feb 2007 10:13:49 -0800 2007-02-03T18:13:49Z Future UI links. Cross-link <a href="http://per.bothner.com/drafts/./../Kawa/Shell/">Shell</a> to <a href="http://per.bothner.com/drafts/./../Kawa/">Kawa</a> or index.