Skip to content

Defining your parser

Aldo Salzberg edited this page Oct 23, 2021 · 51 revisions

Parser types

Csly uses a parser combinator strategy to generate strongly typed parser instances via the ParserBuilder.BuildParser<T, U> method. The BuildParser method includes two generic arguments; one contains the expression tokens you define and the other is the expected parser output type. More formally, a user-defined parser is of type Parser<IN,OUT> where:

  • IN is an enum type with regex or generic [Lexeme] decorators that represent all the tokens (symbols) your language accepts,
  • OUT is the expected output type for your parser instance's Parse(...) method once it is invoked.

The output generic type can be a value or reference type. Traditionally, it will be a structure encoding an Abstract Syntax Tree (AST). In the sample parser discussed in the Getting started section, the output type is an integer which happens to contain the result of the arithmetic expression passed to the Parser.Parse(...) method.

More information on expression tokens is available in the Lexer page.

Typing rules

The generic types entered at ParserBuilder instantiation are checked when ParserBuilder.BuildParser(...) is called. These types will be used for syntax tree generation and traversal (see below). If the build fails no Exception will be thrown. Instead, check the Parser<T, U>.IsError flag or optionally, whether the Parser<T, U>.Errors list is null or not. A non-null list of error messages will be populated with line and column indicators as well as reason for failure where the parse failed. todo: link to documented errors. Additional typing rules are described below.

ParserBuilder and Parser generic input and output type convention

Since the expression token type is specified in both ParserBuilder and Parser OUT generic parameters, all syntax-tree traversal methods in your custom parser must return a value inheriting from type OUT. Depending on the nature of the traversal methods defined in your parser, the expected token types must match the IN generic types defined in ParserBuilder. See, for example, the traversal methods defined in the sample implementation of ExpressionParser.

The syntax tree generated from your parser implementation will use traversal methods in the parser that csly will use to implement terminal (leaf) and non-terminal (branch) nodes in a syntax tree. Clause rules can be encoded in both IN and OUT types.

Clause rules for both IN and OUT types can be classified from an object-structured perspective from their patterns:

Illustrations of each clause rule pattern are as follows:

  • Token<IN> (X) can be implemented by returning a simple value type,
  • OUT (rule: nonTerminal) means that the clause is followed by another,
  • List<Token<IN>>(rule: MyToken+) contains a complete branch of nodes,
  • List<Token<IN>>(rule: nonterminal*) contains an incomplete subtree of nodes,
  • Group<IN,OUT>(rule: (MyToken nonterminal)) contains a subtree of nodes,
  • ValueOption<OUT>(rule: nonTerminal?) specifies that a branch contains an optional closing node,
  • ValueOption<Group<IN,OUT>>(rule: (MyToken nonterminal)?) specifies that a subtree may contain optional closing nodes,
  • List<Group<IN,OUT>>(rule: (MyToken noTerminal)*) implements a fully-specified syntax tree.

An additional way to classify clause rules is as piped or non-piped. Both piped and non-piped rules can also be terminal and non-terminal. Since alternate, or piped (|) choice rules are also terminal or non-terminal, the same typing rules apply as for single (non-piped) statements.

More information is provided in the next section, Implementing a BNF Parser.

Getting started ⬅️ Defining your parser ➡️ Implementing a BNF Parser