Preprocessor

TADS 3 provides a number of extensions to the TADS 2 preprocessor, for greater power and more compatibility with C and C++ preprocessors.

The #include directive uses an improved searching algorithm, and allows URL-style portable notation for relative paths (to subdirectories).
The #include directive expands macros in its filename argument.
You can define macros with substitution parameters, including varying argument lists.
The # (stringizing) and ## (token pasting) operators are supported. In addition, the #@ (single-quote stringizing) operator is provided.
The token pasting operator performs string concatenation when pasting string tokens together.
The #if and #elif directives are supported.
The #line directive is supported.

Include file searching

The #include directive uses an improved algorithm to search for included files; the new algorithm is essentially the same as that used by most C compilers. For files whose names are specified in double quotes, the compiler first looks in the directory containing the including file, then in the directory containing the file that included the including file, and so on until it reaches the original source file; if none of these directories contain the file then the search proceeds as for angle-bracketed files. For angle-bracketed files, the compiler searches in each directory specified in the include path (specified with the t3make –I option), in the order specified by the user (i.e., in the order in which the -I options appear). Finally, the compiler searches in the compiler directory itself.

Note that the compiler automatically adds each library's directory to the include path. That is, for each library (.tl) file in the build, the compiler adds the directory containing that library to the include path. This applies to libraries directly included in the build via the "-library" directive, as well as to libraries included in the build by other libraries. The effect is exactly the same as including a -I option for each library directory.

URL-style relative paths in #include filenames

Filenames in #include directives can refer to relative paths (such as to subdirectories) using relative URL-style notation. (A URL is a "uniform resource locator," which is a World Wide Web standard mechanism for specifying names of things like files.) In particular, you can use a forward slash, "/", as the path separator character, regardless of your local system's filename conventions. Even if you're running on Windows (which uses "\" as the path separator) or Macintosh (which uses ":" as the separator and has rather different rules than Unix-style systems), you can use "/" to specify relative paths, and your source will compile correctly on all systems.

For example, suppose you're using a Macintosh, and you have your source files in a folder called "My HD:My TADS Games:Caverns of Gloom". Now, suppose you decide that you'd like to keep your include files in a subfolder of this folder called "Include Files". You could write your #include lines like so:

#include ":Include Files:defs.h"

But if you did this, and then you gave a copy of your source code to a friend running Windows, your friend wouldn't be able to compile the code without changing that #include directive to match Windows path name conventions.

The solution to this problem is to use URL-style path names. A URL-style path name works the same on every system, so your friend on Windows will be able to compile your code without changes if you use this notation. To use URL notation, use "/" as the path separator, and - regardless of local conventions - place slashes only between path elements, never at the beginning of the filename. So, we'd rewrite the example above like this:

#include "Include Files/defs.h"

Even though this doesn't look a thing like a Mac path name, the Mac version of the TADS 3 compiler will happily find your file in the correct directory, because the Mac version knows how to convert URL-style path names to the correct Mac conventions, just like the Windows and Unix versions know how to convert URL-style names to their own local conventions.

The specific rules for parsing #include filenames are as follows:

For "local" files - that is, filenames enclosed in "double quotes" rather than in <angle brackets> - start looking for the file in the directory containing the source file with the #include, then in the directory that contains the file that included that file, then in that includer's directory, and so forth until we run out of includers.
For local files that we can't find using the first step above, and for all non-local files (specified in <angle brackets>), look in each directory on the include path. The include path is the set of directories specified with -I command-line options, and is searched in the order in which the -I options appeared.
If the file still can't be found, and it appears to be an "absolute" path using the local path naming conventions (which vary by system), try again treating the name as an absolute path. An absolute path is one that fully specifies the location of the file; on Unix, for example, this is a path starting with a "/", and on Windows it's a path starting with a drive specifier (such as "C:").

For each of the first two steps above, the compiler tries twice in each directory it searches:

First, the compiler tries treating the filename as using URL-style syntax. This means that the compiler converts all "/" characters to the local path separator characters, and performs any other transformations required to make the name appear relative. So, on the Mac, the compiler converts "Include Files/defs.h" to ":Include Files:defs.h", then tries looking for that file relative to the current directory being searched.
Second, if the first step fails to find a file, and the file does not use an absolute path in the local naming conventions, the compiler tries to treat the name as using local conventions. So, the compiler simply uses the local path name conventions to look for the file in the current directory being searched.

For all of these rules, the compiler stops searching as soon as its finds an existing file.

Note that even though the compiler accepts local filenames, we strongly encourage using URL-style filenames for all #include files that specify paths, since this will ensure that your source code will compile without changes on other platforms.

#include filename macro expansion

The #include directive expands any macros that appear in the filename, if the filename is not enclosed in quotes or angle brackets. The result of expanding any macros must yield a string delimited in quotes or angle brackets. (This feature is obscure, but TADS 3 includes it for completeness of its ANSI C preprocessor compatibility.)

Preventing multiple inclusion: #pragma once, #pragma all_once

In most cases, you don't want to include the same header file more than once in the same source file, because including the same file several times can often cause compiler errors by repeating the same object or function definitions. In many cases, avoiding multiple inclusion is trickier than just removing redundant #include directives from your source files; it often happens that the same header is #include'd in several other headers, so that first header would be included multiple times if more than one of those other headers were needed in the same source module. You can sometimes solve the problem by carefully adjusting all of the headers so that each file is only included once, but it's often difficult to get this right, especially if you add other source files in the future.

C/C++ programmers usually solve this problem by enclosing the entire contents of each header file in a protective series of #ifdef-type directives. Each header file follows this pattern:

#ifndef MY_HEADER_NAME_H

#define MY_HEADER_NAME_H

// the rest of the contents of the header go here

#endif

The idea is that, the first time this file is included, the preprocessor symbol MY_HEADER_NAME_H (which is usually based on the name of the header file: for myheader.h, we'd use MYHEADER_H) will be undefined, so the #ifndef ("if not defined") would succeed. Thus, everything between the #ifndef and the matching #endif would be compiled, including the #define for the same symbol. If the same file is included again, MY_HEADER_NAME_H would be defined this time, because of the #define that got compiled the first time around; so the #ifndef would fail, so the compiler would skip everything up to the matching #endif.

The TADS 3 compiler provides a somewhat simpler and more direct way of protecting against multiple inclusion. You can put this directive anywhere in a header file:

#pragma once

This tells the compiler that the current file should only be included once for the current source module. If the same file is included again, the compiler simply ignores the redundant #include directive.

The #pragma once directive only affects the current header file. There's a separate directive, #pragma all_once, than tells the compiler that every header file should be included only once. Put #pragma all_once into your main source file before including the first header, and the compiler will ignore every redundant #include directive.

Macro substitution parameters

You can now define substitution parameters in your macros. These parameters, when they appear in the expansion text, are replaced during preprocessing with text specified in the macro's invocation. For example, suppose we define this macro:

#define ERROR(msg)  tadsSay('An error occurred: ' + msg + '\n')

Now suppose we write this in our code somewhere:

   ERROR('invalid value');

During compilation, the preprocessor will expand this macro invocation, substituting the actual parameter value when msg appears in the replacement text. The resulting expansion is:

   tadsSay('An error occurred: ' + 'invalid value' + '\n');

(It is worth pointing out that the compiler will subsequently compute the constant value of this string concatenation, so this will not result in any string concatenation at run-time.)

The TADS 3 preprocessor uses the ANSI C rules for macro expansion with regards to recursive macro invocation and circular definitions. These rules are complex and the need to know them arises quite infrequently, so it is not worth trying to explain them here; authors who are curious should refer to a good ANSI C programming book for the details.

Macros with Variable Argument Lists

A macro can be defined to take a varying number of arguments, which is especially useful when the macro calls a function or method with a varying number of parameters. Although the 1999 ANSI C specification includes a varying macro argument feature, the ANSI C version is quite limited, and TADS 3 diverges from the ANSI definition.

Defining a variable-argument macro

To define a macro with varying arguments, place an ellipsis ("...") immediately after the last parameter in the macro's formal parameter list:

#define ERROR(msg, args...) displayError('Error:', msg, args)

The "..." after the last argument tells the preprocessor that the macro allows zero or more arguments in place of the last parameter, so the ERROR() macro defined above will accept one or more arguments.

Simple expansion with the variable list parameter

During expansion, the parameter name of the varying argument will be replaced by the entire varying part of the argument list, including the commas between adjacent arguments, but not including the comma before the first varying argument. For example:

#define VAR(a, b...) { b }

This macro will expand as follows:

VAR(1)       -> { }

VAR(1,2)     -> { 2 }

VAR(1,2,3,4) -> { 2,3,4 }

Expansion with no variable arguments, and deleting the extra comma

If the varying part of the list contains zero arguments, note that it is replaced by nothing at all. In some cases, this can be problematic; for example, in the ERROR macro defined above, consider this expansion:

ERROR('syntax error') -> displayError('Error:', 'syntax error', )

Note the extra comma after the last argument to displayError – the comma is from the original expansion text in the macro definition, not from the parameter "args", which is empty in this case because no varying arguments were supplied. The extra comma will cause a syntax error when the function call is compiled, so the macro as written is not compatible with an empty varying argument list, even though the preprocessor will allow it.

To correct this problem, we can use a special bit of syntax; this is a horrible kludge, but the Gnu C preprocessor uses the same kludge, so at least it's not completely pulled out of thin air. The token pasting operator (described in more detail below), "##", has a special meaning when it appears after a comma and before a varying argument macro parameter: when (and only when) the varying list is empty, the "##" operator deletes the preceding comma. This only works with commas – if anything else precedes the "##" operator, the operator works as it would in normal (non-varying arguments) cases. We can use this feature to rewrite the ERROR macro:

#define ERROR(msg, args...) displayError('Error:', msg, ## args)

Now when we expand this macro with no additional arguments, the extra comma is deleted:

ERROR('syntax error') -> displayError('Error:', 'syntax error')

ERROR('token error', 1) -> displayError('Error:', 'token error', 1)

Iterative expansion with #foreach

The comma-deleting feature of the "##" operator is useful as far as it goes, but sometimes it's useful to construct more elaborate expansions from varying arguments. For example, suppose we wanted to concatenate the arguments to the ERROR macro together – in other words, we'd like the expansion to look like this:

ERROR('token error', 1, 2) -> displayError('Error:' + 'token error' + 1 + 2)

This is clearly beyond the scope of what we've seen so far. Fortunately, the TADS 3 preprocessor has another feature that makes this sort of construction possible: the #foreach operator. This operator must immediately follow – with no intervening spaces – the varying argument name, and must be immediately followed with a "delimiter" character. Following the delimiter is the main iteration expansion, which ends at the next instance of the delimiter character. Following the second delimiter is the "interim" expansion, which itself ends at the next instance of the delimiter.

You can choose any non-symbol character for the delimiter, as long as it doesn't appear in any of the expansion text – a non-symbol character is anything that can't appear in a symbol, specifically alphabetic characters, numerals, and underscores. The point of letting you choose your own delimiter is to allow you to use anything in the expansion text by choosing a delimiter that doesn't collide with the expansion.

Note that you should be careful if you choose a forward slash ("/") as the delimiter – the preprocessor removes comments before processing macros, so if you have an empty section, the compiler will completely remove two consecutive slashes because it will think it indicates a comment. You're probably better off avoiding using "/" as the delimiter.

This sounds a bit complicated, so let's see an example:

#define ERROR(msg, arg...) displayError('Error: ' + msg arg#foreach: +arg ::)

The first part of the macro is simple:

  displayError('Error: ' + msg

This part expands in the familiar way.  Now we come to this sequence:

  arg#foreach: +arg ::

Remember that the #foreach operator must appear immediately after the varying argument name, as we see here. After the #foreach operator, we have the delimiter; in the case, we've chosen ":", since we don't need any colons in our expansion text. We could just as well have chosen any other character; all that matters is that we don't need the character anywhere in our expansion, since the next appearance of this character terminates the expansion.

So, we have two sub-parts, delimited by colons. The first subpart is " +arg ", and the second subpart is empty.

The first subpart is the main iteration expansion. The preprocessor expands this part once for each actual varying argument, expanding the varying argument name in this part to merely the current argument in the varying list. In the rest of the macro, remember that the varying argument name expands to the full varying list; in a #foreach, though, the varying argument name expands merely to the single, current argument.

The second subpart is the interim iteration expansion. The preprocessor expands this part once for each actual varying argument except for the last one. This is why we call it the "interim" expansion – it is expanded between each iteration.

Let's look at how the macro expands. Consider this invocation:

  ERROR('syntax error')

In this case, we have no varying arguments at all, so the entire #foreach sequence – from the "arg#foreach" part to the final colon – is iterated zero times, and hence expands to nothing at all. The expansion is thus:

  displayError('Error:' + 'syntax error' )

Note that we don't have any problem handling the zero varying arguments, since the entire iteration simply occurs zero times in this case.

Now consider what happens when we include some arguments:

  ERROR('token error', 1, 2)

This time, the #foreach sequence is iterated twice. The first time, "arg" expands to "1", since that's the first varying argument, and the second time, "arg" expands to "2". The two iterations are expanded like this:

+1

+2

These are concatenated together, so the result looks like this:

  displayError('Error: ' + 'token error' +1 +2)

The "interim" portion is useful for solving the same kinds of problems as the "##" comma deletion feature, but is more general. Since the interim portion appears only between each adjacent pair of varying arguments, it is useful for building lists of zero or more arguments. For example, suppose we want to write a macro that adds zero or more values:

  #define ADD(val...) val#foreach:val:+:

If we call this with no arguments, the expansion will be empty, because we'll iterate the #foreach zero times. If we call this one one argument, the result will simply be the argument: we'll iterate the #foreach one time, but we won't include the interim expansion at all, because we skip the interim expansion after the last argument. With two arguments, we'll expand the interim once, between the two. Here are some sample expansions:

  ADD()       ->

  ADD(1)      -> 1

  ADD(1,2)    -> 1+2

  ADD(1,2,3)  -> 1+2+3

Conditional expansion with #ifempty and #ifnempty

In some cases, it is necessary to include a block of text in a variable argument immediately expansion before or after the variable arguments, but only when the argument list is non-empty. In other cases, it is necessary to provide some text instead of the variable arguments when the variable argument list is empty. A pair of operators, #ifempty and #ifnempty, provide these types of conditional expansion.

The #ifempty and #ifnempty operators are similar in syntax to #foreach: these operators must appear in macro expansion text directly after the name of the variable argument formal parameter, with no intervening spaces, and the operator is immediately followed by a delimiter character. After the delimiter comes the conditional expansion text, which is terminated by another copy of the delimiter character.

#ifempty includes its expansion text in the macro's expansion only when the variable argument list is empty, and #ifnempty includes the text only when the variable argument list is non-empty.

For example, suppose you want to define a macro that expands its variable arguments into a concatenated list, and then passes the concatenated list as the second argument to another function. We might try defining this using #foreach:

  #define CALL_CONCAT(firstArg, args...) \

    myFunc(firstArg, args#foreach#args#+#)

However, this has a problem: if the varying argument part of the list is empty, we have an unnecessary comma in the expansion:

  CALL_CONCAT(test) -> myFunc(test, )

This is similar to the problem that we mentioned earlier in describing the "##" operator, but we can't use the "##" operator to delete the comma in this case, because the "##" comma deletion works only when the variable list argument appears directly after the ", ##" sequence.

This is where the #ifempty and #ifnempty operators come in. In this case, we want to include the comma after firstArg in the expansion only when the argument list isn't empty, so we can change the macro like this:

  #define CALL_CONCAT(firstArg, args...) \

    myFunc(firstArg args#ifnempty#,# args#foreach#args#+#)

This does what we want: when the variable argument list is empty, the #ifnempty expansion text is omitted, so we have no extra comma; when we have one or more varying arguments, the #ifnempty expansion is included, so the comma is included in the expansion.

Getting the variable argument count with #argcount

There's one more feature for varying argument lists: you can obtain the number of varying arguments with the #argcount operator. Like #foreach, the #argcount operator must appear immediately after the name of the varying parameter, without any spaces. This operator expands to a token giving the number of arguments in the varying list. For example:

  #define MAKELIST(ret, val...) ret = [val#argcount val#foreach#,val##]

  MAKELIST(lst)            -> lst = [0]

  MAKELIST(lst, 'a')       -> lst = [1,'a']

  MAKELIST(lst, 'a', 'b')  -> lst = [2,'a','b']

Note that #argcount expands to the number of arguments in the varying part of the list only, and doesn't count any fixed arguments.

Stringizing

It is sometimes useful to write a macro that uses the actual text of a substitution parameter as a string constant. This can be accomplished using the "stringizing" operators. The # operator, when it precedes the name of a macro formal parameter in macro expansion text, is replaced by the text of the actual argument value enclosed in double quotes. The #@ operator has a similar effect, but encloses the text in single quotes. For example, suppose we wanted to write a debugging macro that displays the value of an arbitrary expression:

#define printval(val) tadsSay(#@val + ' = ' + toString(val))

We could use this macro in our code like this:

    printval(MyObject.codeNum);

This would expand as follows:

    tadsSay('MyObject.codeNum' + ' = ' + toString(MyObject.codeNum));

Token Pasting

In some cases, it is useful to be able to construct a new symbol out of different parts. This can be accomplished with "token pasting," which constructs a single token from what were originally several tokens. The token pasting operator, ##, when it appears in a macro's expansion text, takes the text of the token to the left of the operator and the text of the token to the right of the operator and pastes them together to form a single token. If the token on either side is a formal parameter to the macro, the operator first expands the formal parameter, then performs pasting on the result.

For example, suppose we wanted to construct a method call based on a partial method name:

#define callDo(verb, actor)  do##verb(actor)

We could use the macro like this:

   dobj.callDo(Take, Me);

This would expand into this text:

   dobj.doTake(Me);

The preprocessor scans a pasted token for further expansion, so if the pasted token is itself another macro, the preprocessor expands that as well:

#define PASTE(a, b) a##b

#define FOOBAR 123

PASTE(FOO, BAR)

The macro above expands as follows. First, the preprocessor expands the PASTE macro, pasting the two arguments together to yield the token FOOBAR. The preprocessor then scans that and finds that it's another macro, so it expands it. The final text is simply 123.

Token pasting only works within macro expansion text; the token pasting operator is ignored if it appears anywhere outside of a #define.

String Concatenation

When you use the ## operator to paste two tokens together, the preprocessor checks to see if both of the tokens being pasted together are strings of the same kind (i.e., they both have the same type of quotes). If they are, the preprocessor combines the strings by removing the closing quote of the first string and the opening quote of the second string.

If either operand of the ## operator is itself modified by the # operator, the preprocessor first applies the # operator or operators, and then applies the ## operator. So, if you paste together two stringized parameters, the result is a single string.

Here are some examples:

#define PAREN_STR(a) "(" ## a ")"

#define CONCAT(a, b) a ## b

#define CONCAT_STR(a, b) #a ## #b

#define DEBUG_PRINT(a) "value of " ## #a ## " = <<a>>"

1: PAREN_STR("parens")

2: CONCAT("abc", "def")

3: CONCAT_STR(uvw, xyz)

4: DEBUG_PRINT(obj.prop[3])

After preprocessing, the file above would appear as follows:

1: "(parens)"

2: "abcdef"

3: "uvwxyz"

4: "value of obj.prop[3] = <<obj.prop[3]>>"

Note that string concatenation is a TADS extension, and is not found in ANSI C preprocessors. The C preprocessor doesn't provide a way of combining string tokens because the C language (not the preprocessor, but the language itself) has a different way of accomplishing the same thing: in C, two adjacent string tokens are always treated as a single string formed by concatenating the two strings together. The TADS language doesn't allow this kind of implicit string pasting, because (unlike in C) there are times when it is valid to use two or more adjacent string tokens, such as in dictionary property lists. The TADS preprocessor therefore provides its own mechanism for concatenating string tokens.

Conditionals: #if and #elif

The TADS 3 preprocessor supports the C-style #if and #elif directives. These directives let you specify sections of text to be compiled or omitted conditionally, based on the value of a constant expression. (TADS 2 supported conditional compilation, but only through the #ifdef and #ifndef directives, which could only determine if a preprocessor symbol was defined or not.) These new directives work the same as they do in ANSI C: the preprocessor expands macros in the argument and evaluates the result as a constant expression; a non-zero value is considered true. Here's an example that checks the version of an included library header to make sure it's recent enough:

#include "MyLib.h"

#if MYLIB_VSN < 5

#error "This module requires MyLib version 5 or higher."

#endif

Note that the defined() preprocessor operator can be used within an expression in these directives to test to determine if a preprocessor symbol is defined. This allows for tests of combinations of defined symbols:

#if defined(MSDOS) || defined(AMIGA) || defined(UNIX)

The #line Directive

The preprocessor supports the C-style #line directive, which lets you override the compiler's internal notion of the current source filename and line number. This feature is probably of little use to most game and library authors, since the compiler automatically keeps track of the actual filename and line number as it processes the source code.

The #line directive will be of greatest interest to tool writers. If you're writing a tool that adds an extra stage of preprocessing before the TADS 3 compiler sees the source code, you can use #line to specify the original source location for generated code. This will allow the compiler to generate error messages that relate back to the original source code, and will allow the debugger to display the original source code rather than the generated TADS code.