The ByteArray Intrinsic Class

Most TADS programs work with the high-level types that TADS provides – integers, strings, lists, objects, and so on. In some cases, though, it's necessary to manipulate the raw bytes that form the basic units of storage on modern computers. The ByteArray class provides a structured way of working directly with bytes.

A ByteArray looks superficially similar to a Vector object, in that you can access the individual byte elements of a ByteArray using the square bracket indexing operator:

  local arr = new ByteArray(100);

  arr[5] = 12;

Note, though, that the elements of a ByteArray can only store byte values, which are represented as integers in the range 0 to 255.

Creating a ByteArray

You create a ByteArray object using the new operator. You must pass to the constructor the number of bytes you want to allocate for the new object; this can be any value from 1 to approximately 2 billion. For example, to create a byte array with 1,024 elements, you would write this:

  local arr = new ByteArray(1024);

The size of a ByteArray is fixed at creation; the size cannot change after the object is created.

You can also create a ByteArray as a copy of another byte array or a portion of another byte array:

  arr = new ByteArray(otherArray, startIndex, len);

The startIndex and len parameters are optional; if they're missing, the new byte array will simply be a complete copy of the existing byte array. If startIndex and len are provided, the new array will be a copy of the region of the other byte array starting at index startIndex and continuing for len bytes. If startIndex is specified but len is missing, the new array will consist of all of the bytes from the original starting with startIndex and continuing to the end of the original array.

Reference Semantics

Like regular Array objects, a ByteArray has reference semantics: when you change a value in a byte array, any other variables that refer to the same ByteArray will refer to the modified version of the array.

Reading and Writing Raw Files

One of the tasks for which ByteArray objects are uniquely suited is working with files stored in a format defined by another application. Using ByteArray objects, you can work directly with the exact bytes stored in an external file, allowing you to process data in arbitrary binary formats.

To read or write a file using ByteArray objects, you must open the file in "raw" mode. Once a file is opened in raw mode, you can use the fileRead() and fileWrite() methods to read bytes from the file into a ByteArray and to write bytes from a ByteArray into the file. Refer to the "tads-io" intrinsic function set for information on the file input/output functions.

ByteArray Methods

copyFrom(sourceArray, sourceStartIndex, destStartIndex, length) – copies bytes from sourceArray, which must be another ByteArray object. Copies bytes starting with the byte in sourceArray indexed by sourceStartIndex, and continuing for length bytes; stores the bytes in this array starting at the byte indexed by destStartIndex.

This routine is safe to use even if sourceArray is the same as the target object, even if the ranges overlap. When copying bytes between overlapping regions of the same array, this routine is careful to move the bytes without overwriting any source bytes before they've been moved.

fillValue(val, startIndex?, length?) – stores the value val in each element of the array, starting at index startIndex and filling the next length bytes. If startIndex and length are missing, val is stored in every element of the array. If startIndex is given but length is missing, val is stored in every element from startIndex to the end of the array. The value val must be an integer in the range 0 to 255.

length() – returns the number of bytes in the ByteArray. This is the same as the size specified when the object was created.

mapToString(charset, startIndex?, length?) – maps the bytes in the array to a Unicode string, interpreting the bytes as belonging to the character set given by charset, which must be an object of class CharacterSet. Returns a string with the result of the character mapping. Only the bytes starting at index startIndex and running for length bytes are included in the mapping. If startIndex and length are missing, all of the bytes in the array are mapped. If startIndex is given but length is missing, the bytes from startIndex to the end of the array are included in the mapping.

The character set given by charset must be known. If the character set is not known, an UnknownCharSetException is thrown. You can determine if the character set is known using the isMappingKnown() method of charset.

readInt(startIndex, format) – translates bytes from the byte array into an integer value. Reads from the byte array starting at the byte given by startIndex, and reads the number of bytes implied by the format code given by format, which also indicates how the bytes should be interpreted into an integer value. The return value is the integer value read and translated from the byte array.

The format code given by format is a bit-wise combination of three parts: a size, a byte order, and a signedness:

The size gives the number of bits in the integer; this can be one of the values FmtSize8, FmtSize16, or FmtSize32, indicating 8-bit, 16-bit, and 32-bit values, respectively.
The byte order can be FmtBigEndian or FmtLittleEndian. A big-endian value is stored with its most significant byte first, followed by the second-most significant byte, and so on. A little-endian value is stored in the opposite order, with its least significant byte first. The readInt() method makes it possible to specify the desired byte ordering because the native byte ordering of different hardware platforms varies, and as a result, the ordering of bytes in data fields in file formats specified by third-party applications can vary. Note that the byte order is irrelevant in the case of 8-bit values, since an 8-bit value requires only one byte in the byte array.
The signedness indicates whether the integer is to be interpreted as signed or unsigned; this can be FmtSigned or FmtUnsigned. Note that the T3 VM doesn't have an unsigned 32-bit datatype, so FmtUnsigned isn't meaningful with FmtSize32.

So, to specify a signed 16-bit value in big-endian byte order, you'd use (FmtSize16 | FmtSigned | FmtBigEndian).

It's a lot of typing to specify all three parts of a data format, so the byte array system header file defines all of the useful combinations as individual macros:

FmtInt8 (signed 8-bit integer)
FmtUInt8 (unsigned 8-bit integer)
FmtInt16LE (signed 16-bit integer in little-endian byte order)
FmtUInt16LE (unsigned 16-bit integer in little-endian byte order)
FmtInt16BE (signed 16-bit integer in big-endian byte order)
FmtUInt16BE (unsigned 16-bit integer in big-endian byte order)
FmtInt32LE (signed 32-bit integer in little-endian byte order)
FmtInt32BE (signed 32-bit integer in big-endian byte order)

This function simply reads the bytes out of the byte array and translates them according to the format specification. There is no information in the byte array itself that indicates how the bytes are to be interpreted into an integer, so it is up to your program to specify the correct format translation. You'll get strange results if you attempt to read values in a format different from the format that was used to write them.

subarray(startIndex, length?) – returns a new ByteArray consisting of the region of this array starting with the byte indexed by startingIndex of the number of bytes given by length. If length is not supplied, the new ByteArray consists of all of the bytes from startingIndex to the last byte of this array.

writeInt(startIndex, format, val) – translates an integer value into a series of bytes, and writes the bytes into the array. The bytes are written starting at the index given by startIndex. The number of bytes written is the byte size implied by the format code given by format. The val argument gives the integer value to be written.

The format code in format has the same meaning as the format code in readInt().

Note that this method doesn't perform any range checking on val. If val is outside of the limits that can be represented with the specified format code, this method will simply truncate the value stored to its low-order portion, discarding any high-order bits that won't fit the format. For example, if you attempt to store 1000 in an unsigned 8-bit format, the value stored would be 232; we can see this more easily by noting that 1000 is 3E8 in hexadecimal, so when we truncate this to 8 bits, we get E8 in hex, which is 232 in decimal. Note also that if you later attempted to read this value back as a signed 8-bit value, the result would be even stranger: it would be –24. This is because E8 is negative when interpreted as signed, so it would be interpreted as the integer 0xFFFFFFE8, which is –24. If you need range checking, your program must provide it. Here are the limits of the different types:

Signed 8-bit: -128 to 127
Unsigned 8-bit: 0 to 255
Signed 16-bit: -32768 to 32767
Unsigned 16-bit: 0 to 65535
Signed 32-bit: -2147483648 to 2147483647

The capacity of a type doesn't depend on its byte order. Note that there should be no need for range checking on a 32-bit value, since the T3 VM's internal integer type itself is a 32-bit signed value and thus can't exceed this range to begin with.

This method stores only the bytes of the translated integer value. It doesn't store any information on the format code used to generate the value; this means that if you later want to read the integer value back out of the byte array, it will be up to your program to specify the correct format code.