List Comprehension Patterns
Many programming languages have some variant of list comprehension syntax. For example Haskell:[ 2 * x | x <- [2,3,5,7,11]]The part after the
|is the scan specifier
x <- [2,3,5,7,11]
, where the scan variable
x
ranges over the
elements of the scan sequence [2,3,5,7,11]
.
The result expression 2 * x
is
evaluated for each element of the resulting sequence,
using the scan variables from the scan specifier(s).
Kawa's new (and experimental) List Comprehension Patterns
separate these two parts by making the scan specifier part of
the more general pattern-matching feature.
The result expression can be widely separated from the scan specifier,
as long is it within the lexical scope of any scan variable it uses.
For example they can be in separate REPL commands
(the #|kawa:N#
is the REPL prompt).
First, some simple variable definions, using the Kawa-specific
(which is like !
define
extended to patterns):
#|kawa:1|# (! A [2 3 5 7 11]) #|kawa:2|# (! [A0 A1 A2 A3 A4] A)The first line binds the variable
A
to the result
of evaluating the sequence literal
[2 3 5 7 11]
.
The second illustrates matching
the pattern [A0 A1 A2 A3 A4]
against
the value of A
, which causes A0
...A4
to be matched against corresponding elements of A
:
#|kawa:3|# (+ A1 A4) 14Finally, here is a list comprehension (or scan) pattern:
#|kawa:4|# (! [a ...] A)This declares
a
to be a scan variable
that ranges over the elements of the scan sequence A
.
You can only use a scan variable inside a scan context:
#|kawa:5|# a /dev/stdin:2:1: using scan variable a while not in scan contextA
scan contextis an expression that is followed by ellipsis. The expression is evaluated once for each value of the corresponding scan variable(s), and the result sequence is
splicedinto its context. Here the scan context is the expression
(* 2 a)
,
which doubles the corresponding element a
:
#|kawa:6|# [(* 2 a) ...] [4 6 10 14 22]
This use of ellipsis is inspired by the syntax-rules
macro transformers of Scheme.
A scan variable can be used multiple times in the same scan context (ellipsis expression):
#|kawa:7|# [a ... a ...] [2 3 5 7 11 2 3 5 7 11] #|kawa:8|# [(* a a) ...] [4 9 25 49 121]Ellipses expressions are useful not just in sequence literals, but in the argument list of a procedure call, where the resulting sequence is spliced into the argument list. This is especially useful for functions that take a variable number of arguments, because that enables a convenient way to do fold/accumulate/reduce operations. For example:
#|kawa:9|# (+ a ...) 28because 28 is the result of
(+ 2 3 5 7 11)
.
An elegant way to implement dot product:
(define (dot-product [x ...] [y ...]) (+ (* x y) ...))When an ellipse expression references two or more distinct scan variables then they are scanned
in parallel. That does not (necessarily) imply muliple threads, but that the first element of the scan result is evaluated using the first element of all the scan sequences, the second element of the result uses the second element of all the scan sequences, and so on.
Sub-patterns in ellipsis patterns
The form before ...
in a pattern is itself a (sub-)pattern.
For example, the sub-pattern may include a type specifier,
which applies to each element:
#|kawa:11|# (define (isum [x::integer ...]) (+ x ...)) #|kawa:12|# (isum [4 5 6]) 15 #|kawa:12|# (isum [4 5.1 6]) Argument #1 (null) to 'isum' has wrong type at gnu.mapping.CallContext.matchError(CallContext.java:189) at atInteractiveLevel-6.isum$check(stdin:11) ...(The stack trace line number
stdin:11
is that of
the isum
definition.)
You can nest ellipsis patterns, allowing matching against sequences whose elements are sequences.
#|kawa:31|# (define (fun2 [[x ...] ...] [y ...]) #|.....32|# [[(+ x y) ...] ...]) #|kawa:33|# (fun2 [[1 2 3] [10 11 12]] [100 200]) [[101 102 103] [210 211 212]]
Note that x
is double-nested, while y
is singly-nested.
The following does not work at time of writing, but will soon:
(! [[x y] ...] sequence-of-pairs) ;;Not yet working
Filtering the scan sequence
Often you only want to use some of the elements from the scan sequence.
For example SQL's select
has a where
with a
boolean expression; only elements for which the expession is true are included.
With scan patterns we have the option to filter the values either where the
scan variables are introduced (the pattern), or when they are used.
(The examples in this section are not working - and the design is undecided.)
Kawa has a splice operator @expr
that takes a sequence and spreads it out to multiple arguuments.
(It is similar to JavaScripts's spread syntax ...expr
.)
It is useful to allow splice operators in scan expressions,
and even more useful to allow them in one or more branches of an if
.
The following expressions selects the odd elements from a
,
doubling them, but drops even elements.
It does the latter by returning a 0-length splice for each even element:
[(if (odd? a) (* 2 a) @[]) ...]
Some reasonable syntatic sugar would be to allow leaving out
the else
expression of an if
, having it default
to a zero-length splice:
[(if (odd? a) (* 2 a)) ...]
APL has an expand
operator that repeats elements of an array
based on another count array.
This functionality can be achived using splices in scan expressions.
For example, the following repeats each odd element, and skips even elements:
[(if (odd? a) @[a a] @[]) ...]Equivalently (parsing issues aside):
[@(if (odd? a) [a a] []) ...]
It is also useful to filter at the pattern level.
The following hypothetical syntax would create a scan variable a
from those elements of the scan sequence A
that
satisfy the boolean expression (P a)
.
(! [a ...] A #!if (P a)) ;; hypotheticalNote the sequence of
a
values used to evaluate (P a)
would be different than the values in scan expressions.
That might be an issue.
Outer product
Instead of processing multiple sequences in parallel,
sometimes you want the multiplicative
combination of elements
from all the sequences.
This is called Cartesian product (or outer product in APL),
or an cross join in database query languages.
This could be handled with a (not implemented) outer-product function:
(outer-product [10 20] [1 2 3]) ⇒ [[10 1] [10 2] [10 3] [20 1] [20 2] [20 3] [30 1] [30 2] [30 3]]We could use it like this:
(! [[x y] ...) (outer-product X Y))
The above syntax is a bit awkward. Some syntatic sugar would help. If so, it should include an option for filtering only the desired combinations based on a boolean expression.
Comprehension as looping
Scan expression can form the basis of a looping framework
in a programming language.
Assume a function ignore
that
ignores its arguments.
Then this is one way to write a
loop,
using its scan expression for its side effects:
for
(let (([i ...] [0 <: 10])) (ignore (do-something-with i) ...))The scan valuable
i
range of the
range expression
[0 <: 10]
.
The scan expression (do-something-with i)
is evaluated
for each i
.
A general looping construct needs a way to exit the loop
based on a condition, as in a while
loop.
Than can be done by having a scan expression return
a special #!done
value.
This assumes the scan elements are evaluated sequentially, which is a reasonable default. It would be useful to have some way to request that evaluation be done in parallel. This could be implemented by translating the code to use Java 8 streams, though for performance reasons we only want to do that when parallel execution is wanted.
Implementation notes
A scan variable is implemented as either a java.util.List
or a (Java) array.
Using an array may be more efficient, partly because it easier to avoid boxing;
on the other hand it may require more copying..
Scan patterns are expanded by the compiler to a loop which initializes
the array. Scan expressions expand to map
calls,
which are generally inlined to loops.
Numerous optimizations and performance improvements are possible.