The LookupTable Intrinsic Class

A LookupTable is an unordered collection of values; each value indexed by a "key," which is a value of any type that's used to look up a value stored in the collection. In effect, this class provides what some programming languages call an associative array, because it allows a value to be associated with an arbitrary key, and then efficiently found given the same key.

To use lookup tables, you must define the LookupTable intrinsic class in your source code. The easiest way to do this is to include the system header file "lookup.h", which is included with the compiler.

Creating a LookupTable

You create a lookup table with the "new" operator:

  local tab = new LookupTable();

You can optionally specify two parameters to fine-tune the table's efficiency: the "bucket count," and the initial capacity. When you create a LookupTable with no arguments as shown above, the object uses default values for these parameters; the default values will always work, but you might be able to improve a table's performance by overriding the default values, especially if the table will contain an especially large or small number of entries. Note that it is never necessary to specify the parameters, since a lookup table will always work properly with the defaults; the only thing the parameters do is tune the object's performance.

  // override the default creation parameters

  // use 256 hash slots and an initial capacity of 1024 entries

  local tab = new LookupTable(256, 1024);

The first parameter, the bucket count, specifies the size of the hash table that the object uses to organize the keys. Refer to the section How a LookupTable Works, below, for details on what the bucket count means and how to select this value.

Note that the bucket count is fixed over the life of the object.

The second parameter, the initial capacity, is purely advisory. This parameter sets the amount of memory the table initially allocates so that it can accommodate the given number of stored values. If you eventually add more values to the table than the initial entry count, the object will automatically expand to accommodate more entries. For maximum efficiency, you should choose a value that's about the same as the number of entries you ultimately expect; this will ensure that you don't use excessive memory by allocating many more initial slots than you end up using, while at the same time avoiding repeated expansion of the table.

Storing and Retrieving Values

You use a lookup table as though it were a list or Vector, except that you can use any value as an index. For example, we could use a lookup table to associate objects with strings:

  symtab['book'] = book;

  symtab['ball'] = ball;

  symtab['table'] = table;

Now, if we want to look up the object associated with one of these words, we simply index the lookup table:

  str = 'ball';

  obj = symtab[str];

If you store a value in the lookup table, and a value was already stored at the same key, the old value at the key is discarded and replaced by the new value. If you look up a key, but the key was never stored in the table, the result is nil.

A LookupTable matches key values the same way the "==" operator does.

LookupTable Iterations

LookupTable is a subclass of Collection, so you can use the createIterator() method to create an Iterator to iterate over the elements of the lookup table. The Iterator that a LookupTable creates is called a LookupTableIterator; it visits the values stored in the table in an arbitrary order.

Because a LookupTable is a Collection, you can use the foreach statement to iterate over its values.

LookupTable Methods

LookupTable is a subclass of Collection, and thus includes all Collection methods. In addition, LookupTable defines the methods listed below.

applyAll(func) – for each element in the table, invokes the function func(key, value), and then changes the element's value to the return value of func. This allows you to modify all of the values in the table according to the given function. The keys are not changed. The entries are visited in arbitrary order. No return value.

forEach(func) – for each element in the table, invokes the function func(value). Any return value from func is ignored; the values and keys stored in the table are not changed. The entries are visited in arbitrary order. No return value.

forEachAssoc(func) – for each element in the table, invokes the function func(key, value). Any return value from func is ignored; the values and keys stored in the table are not changed. The entries are visited in arbitrary order. This argument is the same as forEach(), except that this method provides the callback with the key in addition to the value for each element. No return value.

getBucketCount() – returns the number of "hash buckets" in the table. Since the number of buckets is fixed over the life of the object, this simply returns the bucket count that was specified when the object was created.

getEntryCount() – returns the number of key/value entries in the table. This returns the number of entries actually stored in the table, which is unrelated to the initial capacity that was specified when the object was created.

isKeyPresent(key) – checks to see if an entry with the given key is present in the table. Returns true if the key is present, nil if not.

keysToList() – returns a list consisting of all of the keys in the table. The order of the keys is arbitrary, so callers must not assume any particular ordering.

removeElement(key) – removes the element with the given key, if any. If there is no element with the given key in the table, this makes no change to the table and simply returns nil. If an element is found, the element is removed, and the value associated with the key (before the removal) is returned.

valsToList() – returns a list consisting of all of the values in the table. The order of the values is arbitrary.

The WeakRefLookupTable Intrinsic Class

WeakRefLookupTable is a subclass of LookupTable. It behaves the same as the regular LookupTable class, and has the same methods; the only difference is that the values in a weak-reference table are only "weakly" referenced. (The keys are still "strongly" referenced; only the values are weak references.)

A weak reference is a reference that prevents the garbage collector from removing the referenced object from memory. On each scan of memory, the garbage collector deletes each object it finds that is not referenced at all, and each object that is referenced exclusively through weak references. This means that if an object is referenced only as a value stored in a WeakRefLookupTable, the garbage collector can delete the object. Whenever the garbage collector deletes an object that is stored as a value in a WeakRefLookupTable, the WeakRefLookupTable automatically deletes that key/value pair.

(Weak references have their uses, especially in situations where you want to create a secondary access path to a set of objects for performance reasons, such as an index or a cache. These cases are relatively rare, though, so don't feel that you need to go out of your way to find places to use this intrinsic class. If you can't think of a reason to use this class, just ignore it and use the base LookupTable class.)

How a LookupTable Works

A LookupTable is internally implemented as a "hash table," which is programming jargon for a particular way of organizing a set of values for efficient retrieval. For the most part, there is no practical need for you to know anything about this internal organization, because it's largely invisible when you're using the class. However, the class lets you optionally specify some parameters at creation time that can affect its efficiency, so we'll briefly explain how hash tables work to help you understand how to select these parameters.

A hash table operates by computing a hash value for each key. A hash value is an integer calculated from the key value, and once calculated, the hash value is used to select a bucket to store the key. Since the hash value selects a bucket, the range of the hash is determined by the number of buckets: if we have ten buckets, we calculate a hash value from 1 to 10 for each key.

Ideally, each distinct key value would end up in its own bucket, but this obviously cannot be the case: the number of buckets we have is finite, and almost always smaller than the number of possible key values. Even if we consider only strings as possible keys, we can clearly create more unique values than buckets, no matter how many buckets we use; and then we have lists and all the other types to hash as well. Therefore, a hash table must be able to cope with "hash collisions," where two distinct key values have the same hash value.

Hash tables handle collisions by using the hash value merely as a first-order approximation to finding the key; once we compute the hash value, and thus know the bucket, we must look at each key stored in that bucket to find the particular key we're seeking. If we have N entries stored in a hash table, and M hash buckets, we will have N/M values, on average, stored in each bucket. The more buckets we have, the fewer values we'll have in each bucket, so the faster lookups will be. On the other hand, once the number of buckets is as large as the number of values stored, adding more buckets becomes less helpful. In addition, the greater the number of buckets, the more memory the hash table consumes, so we trade speed for greater memory usage.

Note that it isn't possible in general to guarantee that a hash table will have no collisions, no matter how many buckets you use; if lots of the keys you store all happen to hash to the same value, you still have many keys in one bucket. Fortunately, in practice, most sets of keys end up distributing fairly evenly.

When you create a LookupTable object, you can specify the number of buckets in the object's hash table. If you have some idea of how much data the table might contain, this lets you set up a LookupTable to make it more likely that there will be an efficient distribution of hash values while not using excessive memory. In particular, you should choose the number of buckets so that it is a reasonably high fraction of the number of values you expect, although just how high a fraction depends on the total number of entries as well. For example, if you expect only ten or twenty values to be stored in the table, you should probably use a bucket count of about the same number; if you expect a thousand values, six or seven hundred buckets might be reasonable.