-
Notifications
You must be signed in to change notification settings - Fork 0
CFG: Parsing
-
Whitespace: (
' '
) Any spacing character that separates tokens. -
Comment: (
;
) Text ignored by parser, lasts until end of line. (Treated as Whitespace) - Token: Any group of non-Whitespace/Comment characters.
- Property: Information assigned to a name, consists of a Key and Value.
- Key: The name identifier of a Property or Block.
- Value: The text information associated with a Property's Key.
- Block: A special Property that contains other Properties. Usually described as the actual contents of said special Property.
- Child: Any Property or Block contained within another Block.
- Parent: The Block containing a child Property or Block.
-
Open Brace: (
{
) Beginning of a Block. (Block property's children) -
Close Brace: (
}
) End of a Block. (Block property's children) - Depth: The number of Blocks deep a Property is.
-
Root: A Property or Block that is not contained within another Block. (Depth equals
0
)
Different terms that can be used to describe the same information as above.
- Namespace: Another term to describe a Block, usually when discussing separation of information.
-
Tree: Used to describe the entirety of a CFG file (or Block) in certain contexts. Consider the Root block (
Lego*
) as the trunk, and all children as branches.
Rows are listed in the order that parsing occurs. First, all whitespace and comments are normalized, then tokens are parsed.
Category | Characters | Notes |
---|---|---|
Comment | ; |
Whitespace until Newline |
Newline |
\n ¹ |
End of ; Comments
|
Whitespace | ' ' , \t , \n , \r ¹ |
Required to separate tokens |
Token | any other char | Including { and } characters |
Open Brace |
{ ² |
Match exactly. Only if Value token of property |
Close Brace |
} ² |
Match exactly |
[1]: The characters listed are Space, Tab⇆, Enter↲, and a second character created by Enter↲.
[2]: Open and Close Braces can only be matched exactly, the full token must consist of only this one character.
Rows are listed in the order that parsing occurs.
Token | Notes |
---|---|
Close Brace | Matched at any time, does not affect Key/Value parser state Depth - 1 |
Key | First matched token (that is not } ) |
Value | Second matched token (that is not } )Start new property at current Depth |
Open Brace |
Value token equals { Depth + 1 |
Note that Depth is tracked using the current property being parsed.
Any Close Brace encountered will lower the depth of the current property, even if that property has already assigned its Key and is waiting for its Value token next! This can even decrease the Depth below 0
!
As for the Open Brace, this can only change the Depth in an expected manner. Once a Value token is parsed, a new property is created. Only after creating this new property, will the previous token be checked, and if it's an Open Brace, the new property's Depth will be increased.
CFGPoperties are stored in a doubly-linked list. The only attribute that tracks a property's hierarchy in the file is Depth.
Type | Value | Description |
---|---|---|
char* |
TokensData | Tokenized file data allocation (Only stored by first property) |
const char* |
Key | Pointer to key in TokensData |
const char* |
Value | Pointer to value in TokensData |
uint32 |
Depth | |
uint32 |
Field10 | Unknown usage, assigned 0
|
CFGProperty* |
Next | Next property read from the file |
CFGProperty* |
Previous | Last property read from the file |
TODO: Reconfirm most of this information later.
The Cafeteria mod manager parses CFG files a bit differently from LegoRR, when performing modifications. Keep this in mind when editing by hand Lego.cfg
(and other CFG-like files), so that both programs will treat the contents the same.
Properties are parsed by line (I think). At most one property can be defined on a single line.
Line endings are expected to use the CRLF (\r\n
) format. (This also holds true for script.txt
). In most cases, this shouldn't an issue when editing files on Windows, but many modern text editors (such as VSCode) may default to LF (\n
) line endings.
Only the first two tokens on a line are parsed, so unlike LegoRR, invalid //
comments or other unexpected information will usually be ignored and cleaned up.
A property is considered to be a block if there is only one token on a line, or if the second(?) token on a line equals {
. This is done to handle the fact that some CFG blocks put the Open Brace on the next line.
CFG modification uses the \
character as a path separator for looking up blocks and properties (where LegoRR uses ::
as a path separator in most cases). This effectively restricts block names and property keys from using the \
character.