Dictionaries

To facilitate command parsing, TADS 3 provides an intrinsic class called "Dictionary."  A Dictionary object stores associations between "keys" and objects, and can be efficiently searched for a key to yield all of the objects associated with the key.

 

A "key" is a combination of a string and a property ID.  The Dictionary class includes a property ID with each key so that object associations can be differentiated by type; each property is a type of association.  In practical terms, the property used in a dictionary key will usually be a vocabulary property, such as "noun" or "adjective."   A given word might be used with some objects as a noun, but with other objects as an adjective; for example, "card" might be used as a noun with an "ID Card" object, but as an adjective with a "card slot" object.  The Dictionary class differentiates by property ID to allow for searching a Dictionary object for the objects matching a word in a particular usage context.

 

You must define the Dictionary intrinsic class to use Dictionary objects in your program.  The easiest way to do this is to include the system header file <dict.h>, which is included with the compiler.

Comparators

The Dictionary class allows the program to customize the way input strings are matched to dictionary strings.  This customization is handled through a "comparator" object; this is simply an object that provides a specific set of methods, which the Dictionary calls to perform its comparisons.

 

You can set a Dictionary object's comparator at any time by calling the Dictionary's setComparator() method.  This method takes the comparator object as an argument.  When you set the comparator, subsequent look-up operations are performed using the matching rules contained in the new comparator.

 

The intrinsic class StringComparator provides a customizable comparator implementation, compatible with the Dictionary requirements, that is very fast (since it's implemented as native code in the interpreter).  When the StringComparator class doesn't provide enough flexibility, you can define your own comparator by implementing the set of required methods.

 

The Dictionary class assumes that comparator objects are immutable.  That is, once a comparator is installed in a dictionary, the dictionary assumes that the return values from the required methods, for a given set of argument values, will never change.  This is important because it allows the dictionary to assume it will never need to rebuild its internal tables except when a new comparator is installed.  If you want to change the comparison rules, you must create and install a new comparator object; never attempt to change the comparison rules by modifying the comparator itself.

 

A comparator object must define the following methods:

 

calcHash(str) – calculate and return the "hash code" for the given string.  The hash code is simply an arbitrary integer value; its purpose is to allow the dictionary to divide its overall set of words into many bins, which reduces the amount of work necessary when searching for a particular string by limiting the search to the strings in one bin.  When storing a string, the dictionary computes the hash value by calling this method, then stores the string in a bin specified by the hash value.  When looking up a string, the dictionary computes the string's hash value, and then only has to look in the bin given by the hash value.

 

Note that this function doesn't have to worry about how many bins the dictionary is using.  The dictionary will automatically take the value modulo the bin count (i.e., the dictionary will divide the value returned from calcHash() by the number of bins, and use the integer remainder of this division to select the bin; this will inherently be in the correct range).

 

The hash value returned from this method is arbitrary, hence the method can implement any algorithm it wants for computing the value.  However, there is one absolute requirement that the method must obey, and a couple of properties that the algorithm should have:

 

 

matchValues(inputStr, dictStr) – match the input string inputStr to the dictionary string dictStr.  Returns zero or nil if the values do not match, or any other value if they do match.  Because any non-zero and non-nil value indicates a match, this routine is free to encode additional information about the match in the return value; typically, implementations use bit flags (a combination of binary values OR'd together) for this purpose.

 

Note the asymmetry of the argument: the first argument is always an input string, which is a string to be matched (such as the string argument to the findWord() method); the second is always a dictionary string, which is a string already stored in the dictionary that is being compared with the input word.  Some implementations might take advantage of this distinction to permit certain types of approximations in input strings; for example, an implementation might allow an input string to be abbreviated to a leading substring of the dictionary string, so that "flashl" matches "flashlight," but not vice versa.

Compiler Support

The TADS 3 compiler has built-in support for the Dictionary intrinsic class.  (This built-in support is part of the compiler, not the interpreter; as far as the interpreter is concerned, a Dictionary object is the same as any other kind of object.)  The compiler support is provided by two new statements, dictionary and dictionary property, and by recognition of dictionary properties when defined in object property lists.

 

The dictionary statement has two purposes.  First, it defines a new object instance of class Dictionary, if the name has not been previously defined.  Second, it establishes the active dictionary, which is the dictionary into which the compiler will insert each dictionary word defined with an object.

 

The dictionary property statement tells the compiler that a given property is henceforth a dictionary property.  This means that, whenever the property is subsequently used in an object definition, the property defines dictionary words for the object.  The words are entered into the active dictionary under the given property.

 

Here's some sample code that demonstrates how these statements work.

 

// create a new dictionary and make it active
dictionary myDict;
 
// establish the dictionary properties
dictionary property noun, adjective, plural;
 
// define an object
redBook: Book
    noun = 'book' 'booklet'
    adjective = 'red'
;

 

The effect of these new statements makes the code needed to define an object with vocabulary words very similar to the way it looked in TADS 2.  In particular, note that dictionary properties accept the special implied list syntax that TADS 2 used for vocabulary words: there is no need to enclose a vocabulary property's string list in square brackets.  In fact, the TADS 3 compiler doesn't even allow list notation for dictionary properties.

 

Even though the code looks similar to the way it did in TADS 2, though, there are some important differences.

 

First, the noun and adjective properties at run-time behave like normal properties: the compiler sets the values of these properties to simple string lists containing the strings actually defined, plus any strings inherited from superclasses.  You can use and modify these properties during program execution just like any other; however, note that, because these properties are nothing special to the interpreter, changing them has no effect on the dictionary.

 

Second, the dictionary object created by the dictionary statement is a full-fledged object at run-time.  This means that you can call Dictionary methods on this object in your program, to look up words and modify the dictionary dynamically.

Dictionary Methods

The object is an instance of the intrinsic class Dictionary, which provides the following methods:

 

addWord(obj, str, &voc_prop) – add an object to the dictionary with the given string and property key.  The argument str can be a string value, or can be a list of strings; if str is a list of strings, the result is the same as calling addWord() for each string in the list.  If the word association to be added is already defined (i.e., another entry with the same string, object, and property already exists), the new association is simply ignored.

 

findWord(str, &voc_prop?) – search the dictionary for the given string and property ID.  Returns a list giving all of the matching objects; if there are no objects, returns an empty list.  If voc_prop is omitted or is nil, the method returns all of the objects that match the string for any property.

 

The returned list consists of two elements for each matching object.  The first element of each pair is the matching object; the second element is the return value from the current comparator's matchValues() method for the match.  For example, suppose that the comparator simply returns nil for no match or true for a match; the result list might then look like this:

 

  [blueBook, true, booklet, true, matchbook, true]

 

The matchValues() result is included because some comparators use the result to supply additional information about the match.  For example, StringComparator encodes information on case folding, truncation, and equivalence mappings.  The caller could use this information to choose some matches over others, for example.

 

isWordDefined(str) – searches the dictionary for the given string and determines if it's associated with any objects.  If the word occurs in the dictionary at all, this function returns true; otherwise, it returns nil.

 

forEachWord(func) – invokes the callback function func on each word association in the dictionary.  The callback is invoked as func(obj, str, voc_prop), where obj, str, and voc_prop have the same meanings they do in addWord().  Note that a given string can appear many times in a dictionary, so func can be invoked with the same str value multiple times, once for each string/object/property association.

 

removeWord(obj, str, &voc_prop) – removes from the dictionary the object's association with the given string and property key.  Only the specific association of object, string, and property is removed; if the same object is also associated with other strings, the object is not removed from those other associations, and likewise if the same string and property are associated with different objects, those object associations are not removed.  str can be a string value or a list of strings; if it's a list of strings, the result is the same as calling removeWord() for each string in the list.  If the word association to be removed is not defined, the operation is simply ignored.

 

setComparator(compObj) – set the comparator object.  All subsequent dictionary operations are performed with the new comparator object.  If compObj is nil, the dictionary simply matches input strings to dictionary strings exactly.

 

Once installed, the dictionary considers a comparator object to be immutable.  That is, the dictionary assumes that the return values from the required methods of the comparator for a given set of inputs will never change.  If you want to change the comparison rules for a dictionary, you must create and install a new comparator object: never attempt to change the comparison rules by modifying an existing comparator.

 

Note that changing the comparator is a relatively expensive operation, because the dictionary must rebuild its internal table for the new comparator; programs should thus avoid changing comparators except when necessary.

Notes

There are a few other important things to know about dictionaries.

 

Dictionaries and garbage collection:  A dictionary references its objects "weakly."  This means that adding an object to a dictionary does not prevent the garbage collector from deleting the object.  If an object is referenced in a dictionary, but the object becomes otherwise unreachable, the garbage collector will delete the object, and will at the same time remove all of the object's associations from the dictionary.  While this might seem strange at first glance, it is actually very useful, because it means that you don't have to worry about manually deleting dictionary references to objects that have become unneeded.

 

Because dictionaries reference their entries weakly, the mere presence of an object in a dictionary never keeps the garbage collector from deleting the object.  Hence, if an object in a dictionary becomes otherwise unreachable, the collector will eventually delete the object.  This is normally desirable, because dictionaries are usually used only as a performance optimization to make it quicker to find an object given its name.  However, it is sometimes the case that a dictionary is the primary way to access a group of objects; in these cases, the weak references that a dictionary uses are undesirable because they don't prevent your objects from being deleted.  In these cases, you must create some other explicit reference to your objects; an easy way to accomplish this is to add each such object to a list stored in a property of a static object.  You must take care to remove these explicit references when their objects are no longer needed, so that the objects can be deleted.

 

Creating dictionaries dynamically:  You can create a dictionary object at run-time using the new operator.  Note that, in order to do this, you must define the Dictionary intrinsic class to the compiler, which you would normally accomplish simply by including the dictionary system header file.

 
// include the system header defining the Dictionary class
#include <dict.h>
 
myFunc()
{
    // create a new dictionary with truncation length 9
    local dict = new Dictionary(9);
 
    // add some words
    dict.addWord(redBook, 'red', &adjective);
    dict.addWord(redBook, 'book', &noun);
}

 

Using multiple dictionaries in the compiler:  You can use multiple dictionaries at compile-time simply by using a new dictionary statement for each dictionary.  You can switch back to an existing dictionary with another dictionary statement naming the original dictionary.  The dictionary statement establishes the active dictionary, which remains in effect until the next dictionary statement.  Multiple dictionaries might be useful in certain situations, such as when you want to create different parsing modes, each having their own separate vocabulary words.

 

Dynamic object creation and dictionaries:  When you create a new object with operator new, the interpreter will not automatically add any vocabulary to any dictionary for the object.  While this might seem a deficiency, remember that the interpreter doesn't think of dictionary objects as anything special, so it doesn't have any idea that you might want some random object creation to have a side effect on a completely unrelated object (such as a dictionary).  However, in most cases you will probably not notice this lack of automatic action, because libraries should be able to handle this more or less transparently through inheritance.  In particular, most libraries will probably want to put something like this in their lowest level classes:

 

class Thing: object
    construct()
    {
        // add all of my vocabulary to the default dictionary
        libDict.addWord(self, noun, &noun);
        libDict.addWord(self, adjective, &adjective);
        libDict.addWord(self, plural, &plural);
    }
;

 

As long as each class derived from Thing (or from subclasses of Thing) properly inherits its superclass constructor from its own constructor, the library code will take care of adding the vocabulary words for the new instance, taking advantage of the accessibility of the dictionary property values that TADS 3 provides.

 

Dictionary vs. LookupTable:  The Dictionary class is in some ways similar to the LookupTable intrinsic class, in that both objects allow you to associate keys with values and efficiently look up a value by key.  However, Dictionary is considerably more specialized than the general-purpose LookupTable: Dictionary allows multiple values to be associated with a key, and provides the composite property/string key.    Dictionary is also designed specifically to store its specialized data, and makes more efficient use of memory than a more general implementation would.  In addition, Dictionary allows the string comparison rules to be customized through the comparator object.