XML literals

XML literals are a new feature in Kawa. They make it more convenient to write XML data objects, especially if you're familar with XML syntax. (This is an extract from the SVN version of the Kawa manual.)

You can write XML literals directly in Scheme code, following a #. Notice that the outermost element needs to be prefixed by #, but nested elements do (and must not).

     #<p>The result is <b>final</b>!</p>

Actually, these are not really literals since they can contain enclosed expressions:

     #<em>The result is &{result}.</em>

The value of result is substituted into the output, in a similar way to quasi-quotation. (If you try to quote one of these “XML literals”, what you get is unspecified and is subject to change.)

An xml-literal is usually an element constructor, but there some rarely used forms (processing-instructions, comments, and CDATA section) we'll cover later.

     xml-literal ::= #xml-constructor
     xml-constructor ::= xml-element-constructor
       | xml-PI-constructor
       | xml-comment-constructor
       | xml-CDATA-constructor

Element constructors

     xml-element-constructor ::=
         <QName xml-attribute*>xml-element-datum...</QName >
       | <xml-name-form xml-attribute*>xml-element-datum...</>
       | <xml-name-form xml-attribute*/>
     xml-name-form ::= QName
       | xml-enclosed-expression
     xml-enclosed-expression ::=
         {expression}
       | (expression...)

The first xml-element-constructor variant uses a literal QName, and looks like standard non-empty XML element, where the starting QName and the ending QName must match exactly:

     #<a href="next.html">Next</a>

As a convenience, you can leave out the ending tag(s):

     <para>This is a paragraph in <emphasis>DocBook</> syntax.</>

You can use an expression to compute the element tag at runtime - in that case you must leave out the ending tag:

     #<p>This is <(if be-bold 'strong 'em)>important</>!</p>

You can use arbitrary expression inside curly braces, as long as it evaluates to a symbol. You can leave out the curly braces if the expression is a simple parenthesised compound expression. The previous example is equivalent to:

     #<p>This is <{(if be-bold 'strong 'em)}>important</>!</p>

The third xml-element-constructor variant above is an XML “empty element”; it is equivalent to the second variant when there are no xml-element-datum items.

(Note that every well-formed XML element, as defined in the XML specifications, is a valid xml-element-constructor, but not vice versa.)

Elements contents (children)

The “contents” (children) of an element are a sequence of character (text) data, and nested nodes. The characters &, <, and > are special, and need to be escaped.

     xml-element-datum ::=
         any character except &, or <.
       | xml-constructor
       | xml-escaped
     xml-escaped ::=
         &xml-enclosed-expression
       | &xml-entity-name;
       | xml-character-reference
     xml-character-reference ::=
         &#decimal-digit+;
       | &#xhex-digit+;

Here is an example shows both hex and decimal character references:

     #<p>A&#66;C&#x44;E</p>  ⇒  <p>ABCDE</p>

Currently, the only supported values for xml-entity-name are the builtin XML names lt, gt, amp, quot, and apos, which stand for the characters <, >, &, ", and ', respectively. The following two expressions are equivalent:

     #<p>&lt; &gt; &amp; &quot; &apos;</p>
     #<p>&{"< > & \" '"}</p>

Attributes

     xml-attribute ::=
         xml-name-form=xml-attribute-value
     xml-attribute-value ::=
         "quot-attribute-datum*"
       | 'apos-attribute-datum*'
     quot-attribute-datum ::=
         any character except ", &, or <.
       | xml-escaped
     apos-attribute-datum ::=
         any character except ', &, or <.
       | xml-escaped

If the xml-name-form is either xmlns or a compound named with the prefix xmlns, then technically we have a namespace declaration, rather than an attribute.

QNames and namespaces

The names of elements and attributes are qualified names (QNames), which are represented using compound symbols (see Namespaces). The lexical syntax for a QName is either a simple identifier, or a (prefix,local-name) pair:

     QName ::= xml-local-part
        | xml-prefix:xml-local-part
     xml-local-part ::= identifier
     xml-prefix ::= identifier

An xml-prefix is an alias for a namespace-uri, and the mapping between them is defined by a namespace-declaration. You can either use a define-namespace form, or you can use a namespace declaration attribute:

     xml-namespace-declaration-attribute ::=
         xmlns:xml-prefix=xml-attribute-value
       | xmlns=xml-attribute-value

The former declares xml-prefix as a namespace alias for the namespace-uri specified by xml-attribute-value (which must be a compile-time constant). The second declares that xml-attribute-value is the default namespace for simple (unprefixed) element tags. (A default namespace declaration is ignored for attribute names.)

     (let ((qn (element-name #<gnu:b xmlns:gnu="http://gnu.org/"/>)))
       (list (symbol-local-name qn)
             (symbol-prefix qn)
             (symbol-namespace-uri qn)))
     ⇒ ("b" "gnu" "http://gnu.org/")

Other XML types

Processing instructions

An xml-PI-constructor can be used to create an XML processing instruction, which can be used to pass instructions or annotations to an XML processor (or tool). (Alternatively, you can use the processing-instruction type constructor.)

     xml-PI-constructor ::= <?xml-PI-target xml-PI-content?>
     xml-PI-target ::= NCname (i.e. a simple (non-compound) identifier)
     xml-PI-content ::= any characters, not containing ?>.

For example, the DocBook XSLT stylesheets can use the dbhtml instructions to specify that a specific chapter should be written to a named HTML file:

     #<chapter><?dbhtml filename="intro.html" ?>
     <title>Introduction</title>
     ...
     </chapter>

XML comments

You can cause XML comments to be emitted in the XML output document. Such comments can be useful for humans reading the XML document, but are usually ignored by programs. (Alternatively, you can use the comment type constructor.)

     xml-comment-constructor ::= <!--xml-comment-content-->
     xml-comment-content ::= any characters, not containing --.

CDATA sections

A CDATA section can be used to avoid excessive use of xml-entity-ref such as & in element content.

     xml-CDATA-constructor ::= <![CDATA[xml-CDATA-content]]>
     xml-CDATA-content ::= any characters, not containing ]]>.

The following are equivalent:

     #<p>Specal characters <![CDATA[< > & ' "]]> here.</p>
     #<p>Specal characters &lt; &gt; &amp; &quot; &apos; here.</p>

Kawa remembers that you used a CDATA section in the xml-element-constructor and will write it out using a CDATA constructor.

Tags: Scheme kawa

Per Bothner's blog