Next: Datum syntax, Previous: Lexical and datum syntax, Up: Syntax [Contents][Index]
The lexical syntax determines how a character sequence is split into a sequence of lexemes, omitting non–significant portions such as comments and whitespace. The character sequence is assumed to be text according to the Unicode standard. Some of the lexemes, such as identifiers, representations of number objects, strings etc., of the lexical syntax are syntactic data in the datum syntax, and thus represent objects. Besides the formal account of the syntax, this section also describes what datum values are represented by these syntactic data.
The lexical syntax, in the description of comments, contains a forward reference to datum, which is described as part of the datum syntax. Being comments, however, these datums do not play a significant role in the syntax.
Case is significant except in representations of booleans, number
objects, and in hexadecimal numbers specifying Unicode scalar values.
For example, #x1A
and #X1a
are equivalent. The identifier
Foo
is, however, distinct from the identifier FOO
.
Interlexeme-space may occur on either side of any lexeme, but not within a lexeme.
Identifiers, .
, numbers, characters, and
booleans, must be terminated by a delimiter or by the end
of the input.
lexeme ::= identifier | boolean | number | character | string |(
|)
|[
|]
|#(
|’
|‘
|,
|,@
|.
|#’
|#‘
|#,
|#,@
delimiter ::=(
|)
|[
|]
|"
|;
|#
| whitespace
((UNFINISHED))
Line endings are significant in Scheme in single–line comments and within string literals. In Scheme source code, any of the line endings in line-ending marks the end of a line. Moreover, the two–character line endings carriage-return linefeed and carriage-return next-line each count as a single line ending.
In a string literal, a line-ending not preceded by a \
stands for a linefeed character, which is the standard line–ending
character of Scheme.
intraline-whitespace ::= space | character-tabulation whitespace ::= intraline-whitespace | linefeed | line-tabulation | form-feed | carriage-return | next-line | any character whose category is Zs, Zl, or Zp line-ending ::= linefeed | carriage return | carriage-return linefeed | next-line | carriage-return next-line | line-separator comment ::=;
all subsequent characters up to a line-ending or paragraph-separator | nested-comment |#;
interlexeme-space datum | shebang-comment nested-comment ::=#|
comment-text comment-cont*|#
comment-text ::= character sequence not containing#|
or|#
comment-cont ::= nested-comment comment-text atmosphere ::= whitespace | comment interlexeme-space ::= atmosphere*
As a special case the characters #!/
are treated as starting a comment,
but only at the beginning of file. These characters are used on
Unix systems as an Shebang interpreter directive.
The Kawa reader skips the entire line.
If the last non-whitespace character is \
(backslash) then the following line is also skipped, and so on.
shebang-comment ::= #!
absolute-filename text up to non-escaped line-ending
Whitespace characters are spaces, linefeeds, carriage returns, character tabulations, form feeds, line tabulations, and any other character whose category is Zs, Zl, or Zp. Whitespace is used for improved readability and as necessary to separate lexemes from each other. Whitespace may occur between any two lexemes, but not within a lexeme. Whitespace may also occur inside a string, where it is significant.
The lexical syntax includes several comment forms. In all cases, comments are invisible to Scheme, except that they act as delimiters, so, for example, a comment cannot appear in the middle of an identifier or representation of a number object.
A semicolon (;
) indicates the start of a line comment. The
comment continues to the end of the line on which the semicolon appears.
Another way to indicate a comment is to prefix a datum
with #;
, possibly with
interlexeme-space before the datum. The comment consists
of the comment prefix #;
and the datum together. This
notation is useful for “commenting out” sections of code.
Block comments may be indicated with properly nested #|
and
|#
pairs.
#| The FACT procedure computes the factorial of a non-negative integer. |# (define fact (lambda (n) ;; base case (if (= n 0) #;(= n 1) 1 ; identity of * (* n (fact (- n 1))))))
identifier ::= initial subsequent* | peculiar-identifier initial ::= constituent | special-initial | inline-hex-escape letter ::=a
|b
|c
| ... |z
|A
|B
|C
| ... |Z
constituent ::= letter | any character whose Unicode scalar value is greater than 127, and whose category is Lu, Ll, Lt, Lm, Lo, Mn, Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co special-initial ::=!
|$
|%
|&
|*
|/
|<
|=
|>
|?
|^
|_
|~
subsequent ::= initial | digit | any character whose category is Nd, Mc, or Me | special-subsequent digit ::=0
|1
|2
|3
|4
|5
|6
|7
|8
|9
oct-digit ::=0
|1
|2
|3
|4
|5
|6
|7
hex-digit ::= digit |a
|A
|b
|B
|c
|C
|d
|D
|e
|E
|f
|F
special-subsequent ::=+
|-
|.
|@
escape-sequence ::= inline-hex-escape |\
character-except-x | multi-escape-sequence inline-hex-escape ::=\x
hex-scalar-value;
hex-scalar-value ::= hex-digit+ multi-escape-sequence ::=|
symbol-element*|
symbol-element ::= any character except|
or\
| inline-hex-escape | mnemonic-escape |\|
character-except-x ::= any character exceptx
peculiar-identifier ::=+
|-
|...
|->
subsequent*
Most identifiers allowed by other programming languages are also
acceptable to Scheme. In general, a sequence of letters, digits, and
“extended alphabetic characters” is an identifier when it begins with
a character that cannot begin a representation of a number object. In
addition, +
, -
, and ...
are identifiers, as is a
sequence of letters, digits, and extended alphabetic characters that
begins with the two–character sequence ->
. Here are some
examples of identifiers:
lambda q soup list->vector + V17a <= a34kTMNs ->- the-word-recursion-has-many-meanings
Extended alphabetic characters may be used within identifiers as if they were letters. The following are extended alphabetic characters:
! $ % & * + - . / < = > ? @ ^ _ ~
Moreover, all characters whose Unicode scalar values are greater than
127 and whose Unicode category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co can be used within
identifiers. In addition, any character can be used within an
identifier when specified using an escape-sequence. For example,
the identifier H\x65;llo
is the same as the identifier
Hello
.
Kawa supports two additional non-R6RS ways of making
identifiers using special characters, both taken from Common Lisp:
Any character (except x
) following a backslash is treated
as if it were a letter;
as is any character between a pair of vertical bars.
Identifiers have two uses within Scheme programs:
In contrast with older versions of Scheme, the syntax distinguishes between upper and lower case in identifiers and in characters specified via their names, but not in numbers, nor in inline hex escapes used in the syntax of identifiers, characters, or strings. The following directives give explicit control over case folding.
These directives may appear anywhere comments are permitted and are
treated as comments, except that they affect the reading of subsequent
data. The #!fold-case
directive causes the read
procedure to case-fold (as if by string-foldcase
) each
identifier and character name subsequently read from the same
port. The #!no-fold-case
directive causes the read
procedure to return to the default, non-folding behavior.
Note that colon :
is treated specially for
colon notation in Kawa Scheme,
though it is a special-initial in standard Scheme (R6RS).
((INCOMPLETE))
number ::= ((TODO)) | quantity decimal ::= digit+ optional-exponent |.
digit+ optional-exponent | digit+.
digit+ optional-exponent
optional-exponent ::= empty | exponent-marker optional-sign digit+ exponent-marker ::=e
|s
|f
|d
|l
The letter used for the exponent in a floating-point literal determines its type:
e
Returns a gnu.math.DFloat
- for example 12e2
.
Note this matches the default when there is no exponent-marker.
s
or f
Returns a primitive float
(or java.lang.Float
when boxed as an object) - for example 12s2
or 12f2
.
d
Returns a primitive double
(or java.lang.Double
when boxed)
- for example 12d2
.
l
Returns a java.math.BigDecimal
- for example 12l2
.
Next: Datum syntax, Previous: Lexical and datum syntax, Up: Syntax [Contents][Index]