Lemmings 2 File Formats by GuyPerfect

Like we did for Revolution, I'm starting a thread to document the Lemmings 2 data files and their contents. I am aware that there is existing documentation out there, but I don't feel that it's quite verbose enough to make full sense of what the files truly represent, so I'm starting from scratch. However, I would like to thank geoo, Mindless and any other contributors ahead of time, as I am using those documents for reference.
__________

Compression Format

Integer data types are little-endian.

File
===============================================================================
Identifier String*4 ASCII "GSCM"
UnpackedSize UInt32 Total size of decompressed data
Chunks Chunk[] Compressed data chunks
===============================================================================

Chunk
===============================================================================
LastChunkFlag Byte Last chunk = 0xFF; 0x00 otherwise
SymbolCount UInt16 Number of symbol definitions
SymbolList Byte[Count] Destination symbol names
SymbolValuesA Byte[Count] First byte of symbol
SymbolValuesB Byte[Count] Second byte of symbol
DataSize UInt16 Number of bytes in encoded data
Data Byte[Size] Encoded data bytes
===============================================================================

At the beginning of each chunk, a 256-symbol dictionary is initialized.
Dictionary entries are indexed 0x00 through 0xFF, and each symbol is
initialized to the single byte value that matches its index. The symbol
definitions are then modified by concatenating symbols as specified by the
lists in the chunk header.

For each entry according to the entries in SymbolList (by index) the symbol is
redefined by concatenating the values of the symbols according to
SymbolValuesA and SymbolValuesB (also by index). All three lists represent
items in parallel to one another: the lists are all the same length, and the
item at position X in one list is to be used with the item in position X in the
other two lists.

The following specifies the algorithm for each item X in the lists, from 0 to
SymbolCount - 1. Symbols must be redefined in that order. Let "+" represent a
byte-wise concatenation operator:

Symbol[List[X]] = Symbol[ValuesA[X]] + Symbol[ValuesB[X]]

When bytes are processed from Data, they represent the indexes of symbols in
the dictionary. The symbols are copied wholesale, in the order they are
specified in Data, to the output. This takes place after the symbols are
redefined using the symbol lists.

Chunks will continue to be processed until a chunk with a non-zero
LastChunkFlag has finished processing.

In the event data is not compressed, it will not begin with the "GSCM"
identifier. Use the data as-is in this case. Both compressed and uncompressed
data can be used in most contexts, so they are processed according to whether
that "GSCM" identifier exists or not.
__________

Data File

Many of the data files in the game are packed into a structured archive format, where the file is broken up into sections. This includes the .dat files for graphics and levels.

Integer data types are little-endian.

File
===============================================================================
Identifier String*4 ASCII "FORM"
DataSize UInt32 Size of the remainder of the file
DataType String*4 ASCII identifier for the data file type
Section Section[] File sections containing additional data
===============================================================================

Section
===============================================================================
Identifier String*4 ASCII identifier for the current section
DataSize UInt32 Size of the remainder of the section
Data Byte[Size] Data specific to the type of section
===============================================================================

The number of sections in the file depends on the respective sizes of existing
sections as well as the total data size reported in the file header.
__________

Graphics Representation

For 256-color graphics, palettes are specified and pixels are plotted on the screen. Pixel information is specified in the "styles" data files as well as the full-screen background images. These images specify a particular pixel order, which is demonstrated with the following animation:



Pixels are drawn every fourth pixel, starting with the top-left pixel of the image. Pixels proceed left-to-right, skipping three pixels each time. All rows of pixels are drawn in this manner, from top to bottom. Once the end of the last row is reached, drawing moves back to to the top row, but starts from the second pixel from the left. As before, every fourth pixel is drawn for every row of pixels. After four passes, every pixel in the image has been drawn.

Let's say you have a byte buffer containing pixel data for an image. Each byte is one pixel. The pixel Byte within that buffer, designated as Data[Byte], can be translated to X and Y coordinates with the following general formula:

Operators:
= Assignment
/ Integer division
% Remainder division
* Multiplication
+ Addition

Stride = Width / 4
Pass = Byte / (Stride * Height)
X = Byte % Stride * 4 + Pass
Y = Byte / Stride % Height

Here, Byte is the current byte in the buffer, starting at index 0. Stride represents the number of pixels drawn per scanline in a single pass. Pass indicates the current pass of drawing pixels for each scanline. Width and Height are the final dimensions of the image, and X and Y are the pixel coordinates within that image, relative to the top-left corner, of the pixel specified by the current byte.

Tiles in level data are 16x8 pixels in size. Plugging in those numbers produces the following formula:

Stride = 16 / 4
(Stride = 4)
Pass = Byte / (4 * 8)
X = Byte % 4 * 4 + Pass
Y = Byte / 4 % 8

...

X = Byte % 4 * 4 + Byte / 32
Y = Byte / 4 % 8

This is a somewhat complex formula for the X and Y coordinates, but it cannot be simplified any. However, processing pixel data can be simplified by taking a different approach.

Let's now say that you read pixel data from the original byte buffer, called Data, and you want to re-order them and store them in a new byte buffer, called Output. To do this, we have the input byte index specified as Data[Byte], and the output byte index specified as Output[Pos]. From here, we get the simple end goal formula:

Output[Pos] = Data[Byte]

The calculation of Pos, therefore, is necessary. Using the earlier X and Y pixel coordinates, Pos can be calculated accordingly:

Pos = Y * Width + X

Substituting for X, Y and Width yields a pretty ugly expression, but it can be simplified. Take a look:

Operators:
<< Bitwise left shift
>> Bitwise right shift
| Bitwise OR
& Bitwise AND

Pos = (Byte / 4 % 8) * 16 +
(Byte % 4 * 4 + Byte / 32)

...

Pos = (((Byte >> 2) & 7) << 4) +
(( Byte & 3) << 2) + (Byte >> 5)

...

Pos = ((Byte & 28) << 2) +
((Byte & 3) << 2) + (Byte >> 5)

...

Pos = ((Byte & 31) << 2) + (Byte >> 5)

...

Pos = Byte % 32 * 4 + Byte / 32

How 'bout them apples? Turns out both of those 32s in the expression come from the following expression:

Width * Height / 4

Meaning our final, unadulterated formula looks like this:

Quarter = Width * Height / 4
Output[Byte % Quarter * 4 + Byte / Quarter] = Data[Byte]

This works not only for the level tiles, but any of the graphics stored in the same general format:


__________

Style Palette - L2CL

This data section is found within a FORM data file and has the section name "L2CL"

The palette is expressed as 128 RGB triplets. The represent colors 0-127 in the global palette. The other 128 entries are used by the GUI.

Integer data types are little-endian.
Note that the FORM data stores integers as big-endian.

L2CL
===============================================================================
(Unknown) UInt16 Unknown. Seems to always be 0x0001.
Palette RGB[128] RGB color data
===============================================================================

RGB
===============================================================================
Red Byte Red channel (lower 6 bits only)
Gree Byte Green channel (lower 6 bits only)
Blue Byte Blue channel (lower 6 bits only)
===============================================================================

The palette from MEDIEVAL.DAT:


__________

Style Tiles - L2BL

This data section is found within a FORM data file and has the section name "L2BL"

Most map features, both animated and not, are build up from a number of 16x8-pixel "blocks" of tile data.

Pixels with color value 0 are considered to be "air", while all other pixels are "solid".

Integer data types are little-endian.
Note that the FORM data stores integers as big-endian.

L2BL
===============================================================================
TileCount UInt16 The total number of tiles in the style.
Tiles Tile[] Tile pixel data
===============================================================================

Tile
===============================================================================
Pixels Byte[128] 1 byte per pixel, refers to palette indexes.
===============================================================================

To re-order the pixel buffer so that pixels are stored linearly left-to-right
and top-to-bottom, process every byte B in the Pixels array accordingly:

LinearPixels[(B % 32) * 4 + B / 32] = Pixels[B]

In the above expression, % represents remainder division.

The tiles from MEDIEVAL.BAT:


__________

Style Presets - L2BE

This data section is found within a FORM data file and has the section name "L2BE"

Styles are packaged with "presets", rectangular groups of tiles that are useful for creating level terrain without specifying tiles individually. The game engine does not use these presets, however, but rather levels are stored as big groups of tiles. These presets are only useful to level editors.

Integer data types are little-endian.
Note that the FORM data stores integers as big-endian.

L2BE
===============================================================================
PresetCount UInt16 The total number of presets in the style.
Presets Preset[] Preset definitions.
===============================================================================

Preset
===============================================================================
(Unknown1) Byte Unknown. Possibly used by editor.
(Unknown2) Byte Unknown. Possibly used by editor.
Width Byte Number of tiles wide.
Height Byte Number of tiles tall.
DataSize UInt16 Total size of this Preset, including header.
Tiles UInt16[] Tile indexes (from L2BL).
===============================================================================

Tiles are arranged within presets in the order of left-to-right then
top-to-bottom.

Here are some sample presets from MEDIEVAL.DAT:


__________

Style Sprites - L2SS

This data section is found within a FORM data file and has the section name "L2SS"

There is a handful of different objects in the game that can interact with Lemmings, yet themselves are not represented by tile graphics. These include the cannons and the Medieval dragon and catapult.

Integer data types are little-endian.
Note that the FORM data stores integers as big-endian.

L2SS
===============================================================================
SpriteCount UInt16 The total number of sprites in the style.
Sprites Sprite[] Sprite definitions.
===============================================================================

Sprite
===============================================================================
DataSize UInt16 Size in bytes of the remainder of the Sprite.
Width UInt16 Width of the Sprite, in pixels.
Height UInt16 Height of the Sprite, in pixels.
ImagePointers UInt16[4] Pointers to the data for the 4 pixel layers.
ImageData Byte[] Encoded data representing image content.
===============================================================================

The values of the ImagePointers are relative to the first byte of the Sprites
array. However, they do not account for the DataSize bytes, meaning these
offsets need to be increased by 2 for each element in the Sprites array that
came before it.

The offset relative to the first byte in the Sprites array, therefore, within
sprite S in the array (beginning at 0), can be calculated with the following
formula:

RealImagePointer = ImagePointer + 2 * (S + 1)

As with the other graphics in the game, these sprite graphics are expressed as
4 layers, where each layer represents vertical "stripes" of pixels every 4
columns. The data pointed to by the first ImagePointer represents pixel columns
0, 4, 8, 12, etc. The second ImagePointer represents columns 1, 5, 9, 13, etc.
This continues for the remaining pointers.

The ImageData bytes encode pixel values in such a way that pixels with palette
entry 0 (transparent pixels) are not directly expressed. This results in a
moderate level of compression in most cases, but in some instances will cause
the image data to bloat.

When first processing the data for an image layer, as pointed to by the
ImagePointer, the current X and Y drawing positions within the sprite are
initialized to 0. This corresponds with the top-left pixel of the sprite. When
bytes are read from the data to be used as pixel values, the X position will be
incremented once per byte. A special "newline" command will reset the X
position to 0 and increment Y by 1.

Bytes are processed 4 bits at a time, starting with the high 4 bits of each
byte. The resulting nibbles, here called the "high nibble" and "low nibble",
are bit-packed values containing the following fields:

Field mccc
Bit 3 0

m = Mode
0 - Copy
1 - Skip

c = Count

Drawing stops when two specific conditions are met: the X position is 0, and
the high nibble is 0xF. Once this occurs, no further drawing is processed for
the current layer.

The actual X position in the final image is equal to the X position within the
layer, multiplied by 4, then with the layer number added (where the first layer
is layer 0). In other words:

FinalX = LayerX * 4 + LayerNum

Pixels are processed first by the high nibble, then by the low nibble.

If Mode is Copy, then Count bytes are read from ImageData and stored in the
pixel buffer. If Mode is Skip, then the X position is increased by Count, but
no bytes are read from ImageData.

Should the value of the low nibble be 0, then, after processing both nibbles, a
"newline" operation occurs: X is reset to 0, and Y is incremented by 1.

The special sprites from MEDIEVAL.DAT: