-
Notifications
You must be signed in to change notification settings - Fork 35
Defining your parser
Csly uses a parser combinator strategy to generate strongly typed parser instances via the ParserBuilder.BuildParser<T, U>
method. The BuildParser
method includes two generic arguments; one contains the expression tokens you define and the other is the expected parser output type. More formally, a user-defined parser is of type Parser<IN,OUT>
where:
-
IN
is anenum
type with regex[Lexeme]
decorators that represent all the tokens (symbols) your language accepts, -
OUT
is the expected output type for your parser instance'sParse(...)
method once it is invoked.
The output generic type can be a value or reference type. Traditionally, it will be a structure encoding an Abstract Syntax Tree (AST). In the sample parser discussed in the Getting started section, the output type is an integer which happens to contain the result of the arithmetic expression passed to the Parser.Parse(...)
method.
More information on expression tokens is available in the Lexer page.
The visiting generic types entered at ParserBuilder
instantiation are checked when ParserBuilder.BuildParser(...)
is called. If the build fails no Exception
will be thrown. Instead, check the Parser<T, U>.IsError
flag or optionally, if the Parser<T, U>.Errors
list is not null. A list of error messages will be populated with line and column indicators as well as reason for failure where the parse failed. todo:
link to documented errors. Additional typing rules are described below.
Since the custom parser type is specified in both ParserBuilder
and Parser
OUT
generic parameters, all syntax-tree traversal methods must return a value inheriting from type OUT
. The same is true for IN
types.
Before discussing clause rules, it bears noting that syntax trees (the Parser
output type) has terminal (leaf) and non-terminal (branch) nodes. These nodes represent the "wording" of rules encoded in the syntax tree. Clause rules can be encoded in both IN
and OUT
types and are described here because these rules impact tree-traversal methods that you define in your syntax tree.
Clause rules for both IN
and OUT
types can be classified from an object-structured perspective from their patterns:
- Token: for a terminal clause (
rule: MyToken
), - OUT: for a non terminal clause (
rule: nonTerminal
), - List<Token>: for multiplied terminal clause (
rule: MyToken+
) - List<Token>: for multiplied non terminal clause (
rule: nonterminal*
) - Group<IN,OUT>: for a group/sub rule clause (
rule: (MyToken nonterminal)
) - ValueOption: for an optional non terminal clause (
rule: nonTerminal?
) - ValueOption<Group<IN,OUT>>: for an optional group clause (
rule: (MyToken nonterminal)?
) - List<Group<IN,OUT>>: for a repeated group clause (
rule: (MyToken noTerminal)*
)
Examples for each clause rule pattern are as follows:
-
Token<IN>
represents a simple value type output, -
OUT
meaning that the clause is followed by another.
An additional way to classify clause rules is as piped or non-piped. Both piped and non-piped rules can also be terminal and non-terminal. Since alternate, or piped (|) choice rules are also terminal or non-terminal, the same typing rules apply as for single (non-piped) statements.
More information is provided in the next section, Implementing a BNF Parser.
Getting started ⬅️ Defining your parser ➡️ Implementing a BNF Parser