Enhanced multi-dimensional Arrays

This describe the array data type (a generalization of vectors to multiple indexes or dimensions), along with a set of procedures for working on them.

This specifiation is an extension of SRFI 25, with additions from Racket’s math.array package and other sources. It has been implemented in the Kawa dialect of Scheme.

Issues

The range API is new and untested. Some changes (including functions names) may amke sense. It may make sense to move it into a separate SRFI.

Right now we have shape-specifiers, and the SRFI-4 shape procedure, which creates one kind of shape-specifier. However, there is no array-shape procedure to return a shape as a single object. Instead, there is the array-rank, array-start, and array-end procedures to inspect the shape. Is that enough? Should we have an explicit shape type? (This is essentially the interval type of SRFI-122.) Should it be a distinct opaque type, or one of the shape specifier types?

Any important missing procedures? It might be useful to have array-copy, though if we have array-shape then (array-copy arr) is trivially defined as (array-reshape (array-flatten arr) (array-shape arr)).

Rationale

Specification

Arrays are heterogeneous data structures that generalize vectors to multiple indexes or dimensions. Instead of a single integer index, there are multiple indexes: An index is a vector of integers; the length of a valid index sequence is the rank or the number of dimensions of an array.

A generalized vector is a standard vector, but (depending on implementation) can also include uniform vectors, bytevectors, ranger, and strings (viewed as vectors of characters). Arrays are a extension of vectors: An array whose rank is 1, and where the (single) lower bound is 0 is a (generalized) vector. A library-based implementation of this specification should use the existing (generalized) vector type for "simple" rank-1 zero-lower-bound arrays (not created by share-array).

A rank-0 array has a single value. It is essentially a box for that value. Functions that require arrays may treat non-arrays as a rank-0 array containing that value.

The examples use the array literal syntax and format-array helper function of SRFI 163 for illustrative purposes. However, this specifcation does not require SRFI 163 (or vice versa), though they are designed to work well together.

Ranges

This should/could be moved out to a separate SRFI.

A range has a lower bound, upper bound, and a step that defaults to 1. It is a new type of immutable generalized vector with an optimized compact representation.

A range is an immutable sequence of values that increase linearly - i.e. by a fixed amount (the step) between each element.

(range-from start [step])

Creates an unbounded or non-finite range, where step defaults to 1. The result is an infinite sequence of values, starting with start, and followed by (+ start step), (+ start (* 2 step)), and so on. For example (range-from 3 2) is the odd integers starting at 3.

(range< end [start [step]])

(range<= end [start [step]])

In these cases the sequence counts up: The step must be positive, and defaults to 1, while start defaults to 0. The resulting values is the initial subsequence of (range-from start step) as long as the value x satisfies (< x end), or (<= x end), respectively. (Note: It might be more useful in the range<= case for start defaults to 1, but consistency is probably more desirable.)

(range> end start [step])

(range>= end start [step])

In these cases the sequence counts down: The step must be negative, and defaults to -1. The resulting values are those x such that (> x end), or (>= x end), respectively.

(range-take count [start [step]])

In this case the resulting range is the first cunt elements of the unbounded sequence. The expression (range-take count start step) yields the same sequence as SRFI 1's (iota count start step).

This specification is based on Kawa's range type, though Kawa has a prettier syntax.

Array shape

The shape of an array consists of bounds for each index.

The lower bound b and the upper bound e of a dimension are exact integers with (<= b e). A valid index along the dimension is an exact integer i that satisfies both (<= b i) and (< i e). The length of the array along the dimension is the difference (- e b). The size of an array is the product of the lengths of its dimensions.

A procedure that requires a shape can accept any of the follow shape-specifier values. We use as example a 2*3 array with lower bounds 0, and a 3*4 array with lower bounds 1.

A vector of simple integers. Each integer e is an upper bound, and is equivalent to the range (range< e).
Examples: #(2 3), and the second is not expressible.
A vector of lists of length 2. The first element of each list is the lower bound, and the second is the upper bound. (This variant is not currently implemented in Kawa, but it seems a useful syntax.)
Examples: #((0 2) (0 3)) and #((1 3) (1 4)).
A vector of simple ranges, one for each dimension, all of who are bounded (finite), consist of integer values, and have a step of 1. Each range, which is usually written as (range< e b) or (range< e) expresses the bounds of the corresponding dimension.
Examples: (vector (range< 2) (range<= 2)) and (vector (range< 3 1) (range-take 4 1)).
Using ranges without syntatic sugar can be a bit verbose. Using the Kawa syntactic sugar, the examples are: [[0 <: 2] [0 <=: 2]] and [[1 <: 3] [1 size: 4]].)
A vector consisting of a mix of integers, length-2 lists, and ranges.
Examples: #(2 (0 3)) and `#((1 3) ,(range 1 size: 4)).
A rank-2 array S whose own shape is #(r 2). For each dimension k (where (<= k 0) and (< k r)), the lower bound b_k is (S k 0), and the upper bound e_k is (S k 1).
Examples: #2a((0 2) (0 3)) and #2a((1 3) (1 4)).

Procedure: shape bound ...

Returns a shape, in the (r 2) form. The sequence bound ... must consist of an even number of exact integers that are pairwise not decreasing. Each pair gives the lower and upper bound of a dimension. If the shape is used to specify the dimensions of an array and bound ... is the sequence b0 e0 ... bk ek ... of n pairs of bounds, then a valid index to the array is any sequence j0 ... jk ... of n exact integers where each jk satisfies (<= bk jk) and (< jk ek).

The shape of a d-dimensional array is a d * 2 array where the element at k 0 contains the lower bound for an index along dimension k and the element at k 1 contains the corresponding upper bound, where k satisfies (<= 0 k) and (< k d).

(apply shape bounds) is equivalent to: (apply array (vector 2 (/ (length bounds) 2)) bounds)

Procedure: array-rank array

Returns the number of dimensions of array.
(array-rank
  (make-array (shape 1 2 3 4)))
Returns 2.

Procedure: array-start array k

Returns the lower bound (inclusive) for the index along dimension k. This is most commonly 0.

Procedure: array-end array k

Returns the upper bound for the index along dimension k. The bound is exclusive - i.e. the first integer higher than the last legal index.

Procedure: array-size array

Return the total number of elements of array. This is the product of (- (array-end array k) (array-start array k)) for every valid k.

Array construction

Array indexing

Given a rank-2 array arr with integer indexes i and j, the following all get the element of arr at index [i j].

(array-index-ref arr i j)
(array-ref arr i j)
(array-ref arr [i j])

Using array-index-ref (but not plain array-ref) you can do generalized APL-style slicing and indirect indexing.

An optional extension (implemented in Kawa) allows you to call an array as it it were a function (function-call notation). This should be equivalent to calling array-index-ref.

(arr i j) ;; Optional extension, equivalent to (array-index-ref arr i j)

Procedure: array-ref array k ...

Procedure: array-ref array index

Returns the contents of the element of array at index k .... The sequence k ... must be a valid index to array. In the second form, index must be a vector (a 0-based 1-dimensional array) containing k ....
(array-ref (array [2 3]
              'uno 'dos 'tres
              'cuatro 'cinco 'seis)
   1 0)
Returns cuatro.
(let ((a (array (shape 4 7 1 2) 3 1 4)))
   (list (array-ref a 4 1)
         (array-ref a (vector 5 1))
         (array-ref a (array (shape 0 2)
                         6 1))))
Returns (3 1 4).

Procedure: array-index-ref array index ...

Generalized APL-style array indexing, where each index can be either an array or an integer.

If each index is an integer, then the result is the same as array-ref.

Otherwise, the result is an immutable array whose rank is the sum of the ranks of each index. An integer is treated as rank-0 array.

If marr is the result of (array-index-ref arr M₁ M₂ ...) then:
(marr i₁₁ i₁₂ ... i₂₁ i₂₂ ...)
is defined as:
(arr (M₁ i₁₁ i₁₂ ...) (M₂ i₂₁ i₂₂ ...) ...)
Each M_k gets as many indexes as its rank. If M_k is an integer, then it we use it directly without any indexing.

Here are some examples, starting with simple indexing.
(define arr (array #2a((1 4) (0 4))
                   10 11 12 13 20 21 22 23 30 31 32 33))
arr ⇨
╔#2a@1:3:4══╗
║10│11│12│13║
╟──┼──┼──┼──╢
║20│21│22│23║
╟──┼──┼──┼──╢
║30│31│32│33║
╚══╧══╧══╧══╝
(array-index-ref arr 2 3) ⇨
23
If one index is a vector and the rest are scalar integers, then the result is a vector:
(array-index-ref arr 2 #(3 1)) ⇨
#(23 21)
You can select a “sub-matrix” when all indexes are vectors:
(array-index-ref arr #(2 1) #(3 1 3)) ⇨
╔#2a:2:3═╗
║23│21│23║
╟──┼──┼──╢
║13│11│13║
╚══╧══╧══╝
Using ranges for index vectors selects a rectangular sub-matrix.
(array-index-ref arr (range< 3 1) (range< 4 1)) ⇨
╔#2a:2:3═╗
║11│12│13║
╟──┼──┼──╢
║21│22│23║
╚══╧══╧══╝
You can add new dimensions:
(array-index-ref arr #(2 1) #2a((3 1) (3 2))) ⇨
#3a╤══╗
║23│21║
╟──┼──╢
║23│22║
╠══╪══╣
║13│11║
╟──┼──╢
║13│12║
╚══╧══╝
FIX: Need alternate non-Kawa way to expression pseudo-ranges. The pseudo-range [<:] can be used to select all the indexes along a dimension. To select row 2 (1-origin):
(array-index-ref arr 2 [<:]) ⇨
#(20 21 22 23)
To reverse the order use [>:]:
(array-index-ref arr 2 [>:]) ⇨
#(23 22 21 20)
To select column 3:
(array-index-ref arr [<:] [3]) ⇨
#2a╗
║13║
╟──╢
║23║
╟──╢
║33║
╚══╝
To expand that column to 5 colums you can repeat the column index:
(range-take 5 3 0)
#(3 3 3 3 3)
(array-index-ref arr [<:] (range-take 5 3 0)) ⇨
╔#2a:3:5═╤══╤══╗
║13│13│13│13│13║
╟──┼──┼──┼──┼──╢
║23│23│23│23│23║
╟──┼──┼──┼──┼──╢
║33│33│33│33│33║
╚══╧══╧══╧══╧══╝

Modifying arrays

An implementation that supports SRFI 17 (generalied set!, should allow set! to modify one or multiple elements. To modify a single element:

(set! (arr index ...) new-value)

should be equivalent to

(array-set! arr index ... new-value)

You can set a slice (or all of the elements). In that case:

(set! (arr index ...) new-array)

is equivalent to:

(array-copy! (array-index-share arr index ...) new-array)

Procedure: array-set! array k ... obj

Procedure: array-set! array index obj

Stores obj in the element of array at index k .... Returns the void value. The sequence k ... must be a valid index to array. In the second form, index must be either a vector or a 0-based 1-dimensional array containing k ....
(let ((a (make-array
            (shape 4 5 4 5 4 5))))
   (array-set! a 4 4 4 "huuhkaja")
   (array-ref a 4 4 4))
Returns "huuhkaja".

Compatibility: SRFI-47, Guile and Scheme-48 have array-set! with a different argument order.

Procedure: array-copy! dst src

Compatibility: Guile has a array-copy! with the reversed argument order.

Procedure: array-fill! array value

Set all the values array to value. You can use (array-fill! (array-index-share array indexes...) value) to set a subset of the array.

Transformations and views

A view or transform of an array is an array a₂ whose elements come from some other array a₁, given some transform function T that maps a₂ indexes to a₁ indexes. Specifically (array-ref a₂ indexes) is (array-ref a₁ (T indexes)). Modifying a₂ causes a₁ to be modified; modifying a₁ may modify a₂ (depending on the transform function). The shape of a₂ is in generally different than that of a₁. The result a₂ is mutable if and only if a₁ is, with the same element type restrictions, if any.

Procedure: array-transform array shape transform

This is a general mechanism for creating a view. The result is a new array with the given shape. Accessing this new array is implemented by calling the transform function on the index vector, which must return a new index vector valid for indexing the original array. Here is an example (using the same arr as in the array-index-ref example):
(define arr (array #2a((1 4) (0 4))
                   10 11 12 13 20 21 22 23 30 31 32 33))
(array-transform arr #2a((0 3) (1 3) (0 2))
  (lambda (ix) (let ((i (ix 0)) (j (ix 1)) (k (ix 2)))
                 [(+ i 1)
                  (+ (* 2 (- j 1)) k)]))) ⇨
#3a:3@1:2:2
║10│11║
╟──┼──╢
║12│13║
╠══╪══╣
║20│21║
╟──┼──╢
║22│23║
╠══╪══╣
║30│31║
╟──┼──╢
║32│33║
╚══╧══╝
The array-transform is generalization of share-array, in that it does not require the transform to be affine. Also note the different calling conversions for the tranform: array-transform takes a single argument (a vector of indexes), and returns a single result (a vector of indexes); share-array takes one argument for each index, and returns one value for each index. The difference is historical.

Procedure: array-index-share array index ...

This does the same generalized APL-style indexing as array-index-ref. However, the resulting array is a modifiable view into the argument array.

Procedure: array-reshape array shape

Creates a new array narray of the given shape, such that (array->vector array) and (array->vector narray) are equivalent. I.e. the i’th element in row-major-order of narray is the i’th element in row-major-order of array. Hence (array-size narray) (as specified from the shape) must be equal to (array-size array). The resulting narray is a view such that modifying array also modifies narray and vice versa.

Procedure: share-array array shape proc

Returns a new array of shape shape that shares elements of array through proc. The procedure proc must implement an affine function that returns indices of array when given indices of the array returned by share-array. The array does not retain a reference to shape.
(define i_4
   (let* ((i (make-array
                (shape 0 4 0 4)
                0))
          (d (share-array i
                (shape 0 4)
                (lambda (k)
                   (values k k)))))
      (do ((k 0 (+ k 1)))
          ((= k 4))
         (array-set! d k 1))
      i))
Note: the affinity requirement for proc means that each value must be a sum of multiples of the arguments passed to proc, plus a constant.

Implementation note: arrays have to maintain an internal index mapping from indices k1 ... kd to a single index into a backing vector; the composition of this mapping and proc can be recognised as (+ n0 (* n1 k1) ... (* nd kd)) by setting each index in turn to 1 and others to 0, and all to 0 for the constant term; the composition can then be compiled away, together with any complexity that the user introduced in their procedure.

Here is an example where the array is a uniform vector:
(share-array
  (f64vector 1.0 2.0 3.0 4.0 5.0 6.0)
  (shape 0 2 0 3)
  (lambda (i j) (+ (* 2 i) j)))
   ⇨  #2f64((1.0 2.0 3.0) (4.0 5.0 6.0))

Procedure: array-flatten array

Procedure: array->vector array

Return a vector consisting of the elements of the array in row-major-order.

The result of array-flatten is a fresh (mutable) copy, not a view. The result of array->vector is a view: If array is mutable, then modifying array changes the flattened result and vice versa.

If array is “simple”, array->vector returns the original vector. Specifically, if vec is a vector then:
(eq? vec (array->vector (array-reshape vec shape)))

Implementation

Acknowledgements

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Title

Author

Status

Abstract