ExpandCollapsePrev Next Index

+ 3.1 Array Types

Ok, so we have met the classification of arrays, and we have met the unsafe carray.

It is time to meet a safe ArrayObject! You will find varray in lib/std/varray.flx.

It is a variable length array, with a fixed bound on the length.

A varray is just a pointer, like a carray. This means when you pass one around you're passing a reference. If one procedure modifies the array, everyone shares the modification.

However a varray has two important properties not possessed by carrays. The first is that bounds checking is done, so use is safe. You can find the bound with the "maxlen" method. you can make an empty varray like this:

    var a = varray[int](20.size); // bound 20 elements

and append elements with

    a += 1; a += 2;
    println$ a;

varray(1, 2)

or with

    push_back (a, 3);
    println$ a;

varray(1, 2, 3)

[BUG: you cannot do a . push_back 3, I think this wold be better, it is easy enough to implement by:

    proc push_back (a: varray[T]) ( v:T) => push_back (a, v);

a nice one liner. Since varray is a definite type, this shouldn't lead to an ambiguity (famous last words ..)]

You can also pop values off the end of the array:

    println$ a;

varray(1, 2)

You can also make a varray from an array:

    var x = varray (1,2,3,4);
    println$ x;

varray(1, 2, 3, 4)

[This is a bit risky, because varray (10,20) is an varray of 10 elements initially set to 20, not an array of two integers! That's a BUG IMHO]

There are some other varray constructors. Look in the library! Subject to change (in particular to get rid of ambiguities!)

Now the second IMPORTANT property of varrays.

They're properly garbage collected. Ordinary carrays are NOT. The collector cannot collect values in a general carray allocated with malloc, because it doesn't know the length. With varrays, it does.

Varrays are built into the run time system. They're very special. A varray is just an ordinary pointer, together with a separate mapping between the pointer and the length. The length is NOT stored in the varray (unlke C++ vector).

This means a varray can be cast to a Carray and used in any C code that needs a C array thing, that is, a pointer.

Varrays have a safety property apart from the fact that index bounds are fully checked: the underlying array cannot move. It cannot be invalidated provided the varray or any element thereof is reachable (and you don't do something stupid like write C code that deletes one!).

A pointer into a varray may point to an uninitialised value or a previously used value no longer in use (and not tracked by the GC). But it cannot point to non-memory

NOTE: this isn't implemented yet. Basically we require a a subarray operation. Its safe to have a varray which is a subarray of another. Note that converting to an STL iterator (raw C pointer) is NOT safe.

varray is primarily useful as a buffer, for implementing darray, or for obtaining writable version of an array which is passed by reference. Because of the bound constraint, it is not useful for a generally dynamic length array.

We need darray for that! Now go and read the library code! A darray is a pointer to a varray. When the varray gets filled up a new bigger one is made. So darray is unbounded, but it loses the nice property that iterators into the array cannot be invalidated.

Finally, we have sarray and bsarray.

Sarray isn't really an array. Its a sparse array which has a few specified values, and the rest are defaulted. Typically a sparse array is used with type double and default 0.0 for applications such as distributions. A bsarray is just an sarray with a specified bound. The bound is necessary to allow iteration, which covers all the indexes (including ones that are defaulted!)

sarray is highly efficient, probably the most efficient implementation possible. It uses a Judy array (which is a cache-optimised digital tree) to map the array index from user index space to a darray. New indexes get put on the end of the darray. If any element is set to the default it frees up a slot in the darray for the next new index. A freelist is kept. The darray also expands in an efficient manner, copying the mapping index occasionally when it runs out of reserved space (SLOW!). sarray also supports a pack operation which removes all the unused slots in the map and sorts the indicies.

Once packed, sequential visitation of all values and also all non-default values is extremely fast, as it avoids the need to lookup the user index to internal index map, and because it *sequences* through the storage serially, allowing cache-prefetching. The pack operation itself is reasonably efficient (it's basically like a copying collector).

NOTE: sarray is a general sparse array. It performs the same for totally scattered indices as sets of compact ranges. There are faster implementations if you know you're using a set of compact subranges (clustered or clumped data).

Felix doesn't provide such an array yet. However the garbage collector actually uses one!! Indeed, it can pick whether an arbitrary machine word is a pointer into allocated store. Here, a memory object is a "clump" of contiguous allocated addresses.