Title

Enhanced multi-dimensional Arrays

Author

Per Bothner

Status

Abstract

This describe the array data type (a generalization of vectors to multiple indexes or dimensions), along with a set of procedures for working on them.

This specifiation is an extension of SRFI 25, with additions from Racket’s math.array package and other sources. It has been implemented in the Kawa dialect of Scheme.

Issues

The range API is new and untested. Some changes (including functions names) may amke sense. It may make sense to move it into a separate SRFI.

Right now we have shape-specifiers, and the SRFI-4 shape procedure, which creates one kind of shape-specifier. However, there is no array-shape procedure to return a shape as a single object. Instead, there is the array-rank, array-start, and array-end procedures to inspect the shape. Is that enough? Should we have an explicit shape type? (This is essentially the interval type of SRFI-122.) Should it be a distinct opaque type, or one of the shape specifier types?

Any important missing procedures? It might be useful to have array-copy, though if we have array-shape then (array-copy arr) is trivially defined as (array-reshape (array-flatten arr) (array-shape arr)).

Need alternate non-Kawa way to expression pseudo-ranges.

Any other Kawa-isms that need to beremoved?

Describe implementation.

Rationale

Specification

Arrays are heterogeneous data structures that generalize vectors to multiple indexes or dimensions. Instead of a single integer index, there are multiple indexes: An index is a vector of integers; the length of a valid index sequence is the rank or the number of dimensions of an array.

A generalized vector is a standard vector, but (depending on implementation) can also include uniform vectors, bytevectors, ranger, and strings (viewed as vectors of characters). Arrays are a extension of vectors: An array whose rank is 1, and where the (single) lower bound is 0 is a (generalized) vector. A library-based implementation of this specification should use the existing (generalized) vector type for "simple" rank-1 zero-lower-bound arrays (not created by share-array).

A rank-0 array has a single value. It is essentially a box for that value. Functions that require arrays may treat non-arrays as a rank-0 array containing that value.

An array of rank 2 is frequently called a matrix.

Arrays may be mutable or immutable.

The examples use the array literal syntax and format-array helper function of SRFI 163 for illustrative purposes. However, this specifcation does not require SRFI 163 (or vice versa), though they are designed to work well together.

(array? obj)

Returns #t if obj is an array, otherwise returns #f. Note this also returns true for a (generalized) vector.

Ranges

This should/could be moved out to a separate SRFI.

A range has a lower bound, upper bound, and a step that defaults to 1. It is a new type of immutable generalized vector with an optimized compact representation.

A range is an immutable sequence of values that increase linearly - i.e. by a fixed amount (the step) between each element.

(range-from start [step])

Creates an unbounded or non-finite range, where step defaults to 1. The result is an infinite sequence of values, starting with start, and followed by (+ start step), (+ start (* 2 step)), and so on. For example (range-from 3 2) is the odd integers starting at 3.

(range< end [start [step]])

(range<= end [start [step]])

In these cases the sequence counts up: The step must be positive, and defaults to 1, while start defaults to 0. The resulting values is the initial subsequence of (range-from start step) as long as the value x satisfies (< x end), or (<= x end), respectively. (Note: It might be more useful in the range<= case for start defaults to 1, but consistency is probably more desirable.)

(range> end start [step])

(range>= end start [step])

In these cases the sequence counts down: The step must be negative, and defaults to -1. The resulting values are those x such that (> x end), or (>= x end), respectively.

(range-take count [start [step]])

In this case the resulting range is the first cunt elements of the unbounded sequence. The expression (range-take count start step) yields the same sequence as SRFI 1's (iota count start step).

This specification is based on Kawa's range type, though Kawa has a prettier syntax.

Array shape

The shape of an array consists of bounds for each index.

The lower bound b and the upper bound e of a dimension are exact integers with (<= b e). A valid index along the dimension is an exact integer i that satisfies both (<= b i) and (< i e). The length of the array along the dimension is the difference (- e b). The size of an array is the product of the lengths of its dimensions.

A procedure that requires a shape can accept any of the follow shape-specifier values. We use as example a 2*3 array with lower bounds 0, and a 3*4 array with lower bounds 1.

Procedure: shape bound ...

Returns a shape, in the (r 2) form. The sequence bound ... must consist of an even number of exact integers that are pairwise not decreasing. Each pair gives the lower and upper bound of a dimension. If the shape is used to specify the dimensions of an array and bound ... is the sequence b0 e0 ... bk ek ... of n pairs of bounds, then a valid index to the array is any sequence j0 ... jk ... of n exact integers where each jk satisfies (<= bk jk) and (< jk ek).

The shape of a d-dimensional array is a d * 2 array where the element at k 0 contains the lower bound for an index along dimension k and the element at k 1 contains the corresponding upper bound, where k satisfies (<= 0 k) and (< k d).

(apply shape bounds) is equivalent to: (apply array (vector 2 (/ (length bounds) 2)) bounds)

Procedure: array-rank array

Returns the number of dimensions of array.

(array-rank
  (make-array (shape 1 2 3 4)))

Returns 2.

Procedure: array-start array k

Returns the lower bound (inclusive) for the index along dimension k. This is most commonly 0.

Procedure: array-end array k

Returns the upper bound for the index along dimension k. The bound is exclusive - i.e. the first integer higher than the last legal index.

Procedure: array-size array

Return the total number of elements of array. This is the product of (- (array-end array k) (array-start array k)) for every valid k.

Array construction

See also array-reshape.

Procedure: array shape obj ...

Returns a new array whose shape is given by shape and the initial contents of the elements are obj ... in row major order. The array does not retain a reference to shape.

Procedure: make-array shape

Procedure: make-array shape value...

Returns a newly allocated array whose shape is given by shape. If value is provided, then each element is initialized to it. If there is more than one value, they are used in order, starting over when the values are exhausted. If there is no value, the initial contents of each element is unspecified. The array does not retain a reference to shape.

(make-array #(2 4) 1 2 3 4 5) ⇨
╔#2a:2:4╗
║1│2│3│4║
╟─┼─┼─┼─╢
║5│1│2│3║
╚═╧═╧═╧═╝

Compatibility: Guile has an incompatible make-array procedure.

Procedure: build-array shape procedure

Construct a “virtual array” of the given shape, which uses no storage for the elements. Instead, elements are calculated on-demand by calling procedure, which takes a single argument, an index vector.

There is no caching or memoization.

(build-array #2a((10 12) (0 3))
  (lambda (ind)
    (let ((x (ind 0)) (y (ind 1)))
      (- x y)))) ⇨
#2a@10:2:3
║10│ 9│8║
╟──┼──┼─╢
║11│10│9║
╚══╧══╧═╝

Procedure: index-array shape

Return a new immutable array of the specified shape where each element is the corresponding row-major index. Same as (array-reshape [0 <: size] shape) where size is the array-size of the resulting array.

(index-array #2a((1 3) (2 6))) ⇨
#2a@1:2@2:4
║0│1│2│3║
╟─┼─┼─┼─╢
║4│5│6│7║
╚═╧═╧═╧═╝

Array indexing

Given a rank-2 array arr with integer indexes i and j, the following all get the element of arr at index [i j].

(array-index-ref arr i j)
(array-ref arr i j)
(array-ref arr [i j])

Using array-index-ref (but not plain array-ref) you can do generalized APL-style slicing and indirect indexing.

An optional extension (implemented in Kawa) allows you to call an array as it it were a function (function-call notation). This should be equivalent to calling array-index-ref.

(arr i j) ;; Optional extension, equivalent to (array-index-ref arr i j)

Procedure: array-ref array k ...

Procedure: array-ref array index

Returns the contents of the element of array at index k .... The sequence k ... must be a valid index to array. In the second form, index must be a vector (a 0-based 1-dimensional array) containing k ....

(array-ref (array [2 3]
              'uno 'dos 'tres
              'cuatro 'cinco 'seis)
   1 0)

Returns cuatro.

(let ((a (array (shape 4 7 1 2) 3 1 4)))
   (list (array-ref a 4 1)
         (array-ref a (vector 5 1))
         (array-ref a (array (shape 0 2)
                         6 1))))

Returns (3 1 4).

Procedure: array-index-ref array index ...

Generalized APL-style array indexing, where each index can be either an array or an integer.

If each index is an integer, then the result is the same as array-ref.

Otherwise, the result is an immutable array whose rank is the sum of the ranks of each index. An integer is treated as rank-0 array.

If marr is the result of (array-index-ref arr M1 M2 ...) then:

(marr i11 i12 ... i21 i22 ...)

is defined as:

(arr (M1 i11 i12 ...) (M2 i21 i22 ...) ...)

Each Mk gets as many indexes as its rank. If Mk is an integer, then it we use it directly without any indexing.

Here are some examples, starting with simple indexing.

(define arr (array #2a((1 4) (0 4))
                   10 11 12 13 20 21 22 23 30 31 32 33))
arr ⇨
╔#2a@1:3:4══╗
║10│11│12│13║
╟──┼──┼──┼──╢
║20│21│22│23║
╟──┼──┼──┼──╢
║30│31│32│33║
╚══╧══╧══╧══╝
(array-index-ref arr 2 3) ⇨
23

If one index is a vector and the rest are scalar integers, then the result is a vector:

(array-index-ref arr 2 #(3 1)) ⇨
#(23 21)

You can select a “sub-matrix” when all indexes are vectors:

(array-index-ref arr #(2 1) #(3 1 3)) ⇨
╔#2a:2:3═╗
║23│21│23║
╟──┼──┼──╢
║13│11│13║
╚══╧══╧══╝

Using ranges for index vectors selects a rectangular sub-matrix.

(array-index-ref arr (range< 3 1) (range< 4 1)) ⇨
╔#2a:2:3═╗
║11│12│13║
╟──┼──┼──╢
║21│22│23║
╚══╧══╧══╝

You can add new dimensions:

(array-index-ref arr #(2 1) #2a((3 1) (3 2))) ⇨
#3a╤══╗
║23│21║
╟──┼──╢
║23│22║
╠══╪══╣
║13│11║
╟──┼──╢
║13│12║
╚══╧══╝

FIX: Need alternate non-Kawa way to expression pseudo-ranges. The pseudo-range [<:] can be used to select all the indexes along a dimension. To select row 2 (1-origin):

(array-index-ref arr 2 [<:]) ⇨
#(20 21 22 23)

To reverse the order use [>:]:

(array-index-ref arr 2 [>:]) ⇨
#(23 22 21 20)

To select column 3:

(array-index-ref arr [<:] [3]) ⇨
#2a╗
║13║
╟──╢
║23║
╟──╢
║33║
╚══╝

To expand that column to 5 colums you can repeat the column index:

(range-take 5 3 0)
#(3 3 3 3 3)
(array-index-ref arr [<:] (range-take 5 3 0)) ⇨
╔#2a:3:5═╤══╤══╗
║13│13│13│13│13║
╟──┼──┼──┼──┼──╢
║23│23│23│23│23║
╟──┼──┼──┼──┼──╢
║33│33│33│33│33║
╚══╧══╧══╧══╧══╝

Modifying arrays

An implementation that supports SRFI 17 (generalied set!, should allow set! to modify one or multiple elements. To modify a single element:

(set! (arr index ...) new-value)

should be equivalent to

(array-set! arr index ... new-value)

You can set a slice (or all of the elements). In that case:

(set! (arr index ...) new-array)

is equivalent to:

(array-copy! (array-index-share arr index ...) new-array)

Procedure: array-set! array k ... obj

Procedure: array-set! array index obj

Stores obj in the element of array at index k .... Returns the void value. The sequence k ... must be a valid index to array. In the second form, index must be either a vector or a 0-based 1-dimensional array containing k ....

(let ((a (make-array
            (shape 4 5 4 5 4 5))))
   (array-set! a 4 4 4 "huuhkaja")
   (array-ref a 4 4 4))

Returns "huuhkaja".

Compatibility: SRFI-47, Guile and Scheme-48 have array-set! with a different argument order.

Procedure: array-copy! dst src

Compatibility: Guile has a array-copy! with the reversed argument order.

Procedure: array-fill! array value

Set all the values array to value. You can use (array-fill! (array-index-share array indexes...) value) to set a subset of the array.

Transformations and views

A view or transform of an array is an array a2 whose elements come from some other array a1, given some transform function T that maps a2 indexes to a1 indexes. Specifically (array-ref a2 indexes) is (array-ref a1 (T indexes)). Modifying a2 causes a1 to be modified; modifying a1 may modify a2 (depending on the transform function). The shape of a2 is in generally different than that of a1. The result a2 is mutable if and only if a1 is, with the same element type restrictions, if any.

Procedure: array-transform array shape transform

This is a general mechanism for creating a view. The result is a new array with the given shape. Accessing this new array is implemented by calling the transform function on the index vector, which must return a new index vector valid for indexing the original array. Here is an example (using the same arr as in the array-index-ref example):

(define arr (array #2a((1 4) (0 4))
                   10 11 12 13 20 21 22 23 30 31 32 33))
(array-transform arr #2a((0 3) (1 3) (0 2))
  (lambda (ix) (let ((i (ix 0)) (j (ix 1)) (k (ix 2)))
                 [(+ i 1)
                  (+ (* 2 (- j 1)) k)]))) ⇨
#3a:3@1:2:2
║10│11║
╟──┼──╢
║12│13║
╠══╪══╣
║20│21║
╟──┼──╢
║22│23║
╠══╪══╣
║30│31║
╟──┼──╢
║32│33║
╚══╧══╝

The array-transform is generalization of share-array, in that it does not require the transform to be affine. Also note the different calling conversions for the tranform: array-transform takes a single argument (a vector of indexes), and returns a single result (a vector of indexes); share-array takes one argument for each index, and returns one value for each index. The difference is historical.

Procedure: array-index-share array index ...

This does the same generalized APL-style indexing as array-index-ref. However, the resulting array is a modifiable view into the argument array.

Procedure: array-reshape array shape

Creates a new array narray of the given shape, such that (array->vector array) and (array->vector narray) are equivalent. I.e. the i’th element in row-major-order of narray is the i’th element in row-major-order of array. Hence (array-size narray) (as specified from the shape) must be equal to (array-size array). The resulting narray is a view such that modifying array also modifies narray and vice versa.

Procedure: share-array array shape proc

Returns a new array of shape shape that shares elements of array through proc. The procedure proc must implement an affine function that returns indices of array when given indices of the array returned by share-array. The array does not retain a reference to shape.

(define i_4
   (let* ((i (make-array
                (shape 0 4 0 4)
                0))
          (d (share-array i
                (shape 0 4)
                (lambda (k)
                   (values k k)))))
      (do ((k 0 (+ k 1)))
          ((= k 4))
         (array-set! d k 1))
      i))

Note: the affinity requirement for proc means that each value must be a sum of multiples of the arguments passed to proc, plus a constant.

Implementation note: arrays have to maintain an internal index mapping from indices k1 ... kd to a single index into a backing vector; the composition of this mapping and proc can be recognised as (+ n0 (* n1 k1) ... (* nd kd)) by setting each index in turn to 1 and others to 0, and all to 0 for the constant term; the composition can then be compiled away, together with any complexity that the user introduced in their procedure.

Here is an example where the array is a uniform vector:

(share-array
  (f64vector 1.0 2.0 3.0 4.0 5.0 6.0)
  (shape 0 2 0 3)
  (lambda (i j) (+ (* 2 i) j)))
   ⇨  #2f64((1.0 2.0 3.0) (4.0 5.0 6.0))

Procedure: array-flatten array

Procedure: array->vector array

Return a vector consisting of the elements of the array in row-major-order.

The result of array-flatten is a fresh (mutable) copy, not a view. The result of array->vector is a view: If array is mutable, then modifying array changes the flattened result and vice versa.

If array is “simple”, array->vector returns the original vector. Specifically, if vec is a vector then:

(eq? vec (array->vector (array-reshape vec shape)))

Implementation

TODO - describe Kawa implementation

Acknowledgements

Copyright

Copyright (C) Per Bothner 2018

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Author: Per Bothner