Skip to content

CFG: Parsing

Robert Jordan edited this page Jun 21, 2021 · 3 revisions

CFG: Parsing

Terminology

  • Whitespace: (' ') Any spacing character that separates tokens.
  • Comment: (;) Text ignored by parser, lasts until end of line. (Treated as Whitespace)
  • Token: Any group of non-Whitespace/Comment characters.
  • Property: Information assigned to a name, consists of a Key and Value.
  • Key: The name identifier of a Property or Block.
  • Value: The text information associated with a Property's Key.
  • Block: A special Property that contains other Properties. Usually described as the actual contents of said special Property.
  • Child: Any Property or Block contained within another Block.
  • Parent: The Block containing a child Property or Block.
  • Open Brace: ({) Beginning of a Block. (Block property's children)
  • Close Brace: (}) End of a Block. (Block property's children)
  • Depth: The number of Blocks deep a Property is.
  • Root: A Property or Block that is not contained within another Block. (Depth equals 0)

Synonyms

Different terms that can be used to describe the same information as above.

  • Namespace: Another term to describe a Block, usually when discussing separation of information.
  • Tree: Used to describe the entirety of a CFG file (or Block) in certain contexts. Consider the Root block (Lego*) as the trunk, and all children as branches.

Character Types

Rows are listed in the order that parsing occurs. First, all whitespace and comments are normalized, then tokens are parsed.

Category Characters Notes
Comment ; Whitespace until Newline
Newline \n ¹ End of ; Comments
Whitespace ' ', \t, \n, \r ¹ Required to separate tokens
Token any other char Including { and } characters
Open Brace { ² Match exactly. Only if Value token of property
Close Brace } ² Match exactly

[1]: The characters listed are Space, Tab⇆, Enter↲, and a second character created by Enter↲.

[2]: Open and Close Braces can only be matched exactly, the full token must consist of only this one character.

Token Types

Rows are listed in the order that parsing occurs.

Token Notes
Close Brace Matched at any time, does not affect Key/Value parser state
Depth - 1
Key First matched token (that is not })
Value Second matched token (that is not })
Start new property at current Depth
Open Brace Value token equals {
Depth + 1

About Depth

Note that Depth is tracked using the current property being parsed.

Any Close Brace encountered will lower the depth of the current property, even if that property has already assigned its Key and is waiting for its Value token next! This can even decrease the Depth below 0!

As for the Open Brace, this can only change the Depth in an expected manner. Once a Value token is parsed, a new property is created. Only after creating this new property, will the previous token be checked, and if it's an Open Brace, the new property's Depth will be increased.

Storage

CFGPoperties are stored in a doubly-linked list. The only attribute that tracks a property's hierarchy in the file is Depth.

Type Value Description
char* TokensData Tokenized file data allocation
(Only stored by first property)
const char* Key Pointer to key in TokensData
const char* Value Pointer to value in TokensData
uint32 Depth
uint32 Field10 Unknown usage, assigned 0
CFGProperty* Next Next property read from the file
CFGProperty* Previous Last property read from the file

Cafeteria

TODO: Reconfirm most of this information later.

The Cafeteria mod manager parses CFG files a bit differently from LegoRR, when performing modifications. Keep this in mind when editing by hand Lego.cfg (and other CFG-like files), so that both programs will treat the contents the same.

Properties are parsed by line (I think). At most one property can be defined on a single line.

Line endings are expected to use the CRLF (\r\n) format. (This also holds true for script.txt). In most cases, this shouldn't an issue when editing files on Windows, but many modern text editors (such as VSCode) may default to LF (\n) line endings.

Only the first two tokens on a line are parsed, so unlike LegoRR, invalid // comments or other unexpected information will usually be ignored and cleaned up.

A property is considered to be a block if there is only one token on a line, or if the second(?) token on a line equals {. This is done to handle the fact that some CFG blocks put the Open Brace on the next line.

CFG modification uses the \ character as a path separator for looking up blocks and properties (where LegoRR uses :: as a path separator in most cases). This effectively restricts block names and property keys from using the \ character.