-
Notifications
You must be signed in to change notification settings - Fork 35
EBNF parser
An EBNF parser is an extension of a BNF parser. So for better understanding please first refer to BNF parser page as it contains information shared by both BNF and EBNF parsers.
you can use EBNF notation :
- '*' to repeat 0 or more the same terminal or non terminal
- '+' to repeat once or more the same terminal or non terminal
for repeated elements values passed to [Production]
methods are :
-
List<TOut>
for a repeated non terminal -
List<Token<TIn>>
for a repeated terminal
[Production("listElements: value additionalValue*")]
public JSon listElements(JSon head, List<JSon> tail)
{
JList values = new JList(head);
values.AddRange(tail);
return values;
}
See EBNFJsonParser.cs for a complete EBNF json parser.
the '?' modifier allow optional token or non-erminal.
- for tokens the
Token<TIn>
parameter has aIsEmpty
property set totrue
when the matching token is absent. - for nonterminal the visitor method get an
ValueOption<TOut>
instead of TOut. Then the parameter can be tested for emptyness with IsNone property.
//option token
[Production("block: A B? C")]
public AST listElements(Token<TIn>, a Token<TIn> b, Token<TIn> c)
{
if (b.IsEmpty) {
// do something usefull
}
else {
string bValue = b.Value;
// do other thing still usefull
}
}
// optional non terminal
[Production("root2 : a B? c ")]
public string root2(Token<OptionTestToken> a, ValueOption<string> b, Token<OptionTestToken> c)
{
StringBuilder r = new StringBuilder();
r.Append($"R(");
r.Append(a.Value);
r.Append(b.Match(v => $",{v}", () => ",<none>"));
r.Append($",{c.Value}");
r.Append($")");
return r.ToString();
}
You can define groups (also known as sub-rules) in a production rule. A group is a sequence of terminals or non terminals. Groups only accept following items :
- terminals :
TERM
- discared terminals :
TERM[d]
- non terminals :
nonterm
- choices :
[ SOME | TERM |OR | OTHER ]
(see [alternate choices](#alternate choice) )
Modifiers are not allowed within a group (except discard on terminals).
the matching method parameter for a group is a Group<TIn,TOut>
. A Group<TIn,TOut>
is a list of Token<IN>
or TOut
. Values in the Group are listed in the same order as their corresping clauses. Group<TIn,TOut>
exposes method to ease access to values.
Groups can be "multiplied" using a modifier. In this case the value returned is a List<Group<TIn,TOut>>
Groups can also be optional using the ?
operator. Then the returned value is a ValueOption<Group<IN,OUT>>
.
[Production("listElements: value (COMMA [d] value)* ")]
public JSon listElements(JSon head, List<Group<JsonToken,JSon>> tail)
{
JList values = new JList(head);
values.AddRange(tail.Select((Group<JsonToken,JSon> group) => group.Value(0)).ToList<JSon>());
return values;
}
[Production("rootOption : A ( SEMICOLON [d] A )? ")]
public string rootOption(Token<GroupTestToken> a, ValueOption<Group<GroupTestToken, string>> option)
{
StringBuilder builder = new StringBuilder();
builder.Append("R(");
builder.Append(a.Value);
var gg = option.Match(
(Group<GroupTestToken, string> group) =>
{
var aToken = group.Token(0).Value;
builder.Append($";{aToken}");
return group;
},
() =>
{
builder.Append(";");
builder.Append("a");
var g = new Group<GroupTestToken, string>();
g.Add("<none>", "<none>");
return g;
});
builder.Append(")");
return builder.ToString();
}
In some case you just don't want to write many production rules when those rules only differ with a single terminal or non terminal clause. For these case you can use the | operator. Alternate choices are grouped together between brackets [ ... ]. a pipe | separate each different choice :
public class AlternateChoiceTestTerminal
{
[Production("choice : [ a | b | c]")]
public string Choice(Token<OptionTestToken> c)
{
return c.Value;
}
}
? + and * modifiers are allowed :
public class AlternateChoiceTestTerminal
{
[Production("choice : [ a | b | c]*")]
public string Choice(List<Token<OptionTestToken>> c)
{
return c.Value;
}
}
terminal (and only terminal) choice group can be ignored with the [d] specifier (see ignoring syntax sugar tokens):
public class AlternateChoiceTestTerminal
{
[Production("choice : a [ a | b | c] [d]")]
public string Choice(Token<OptionTestToken> firstTokenOnly )
{
return c.Value;
}
}
Sometimes tokens do not bring any semantic value. Their only value is to denotes syntaxic structure.
For example in C like language, brackets ('{') only denotes beginning of blocks but does add any other information. Their only use is to guide the syntax parser. So we proposed a way to dismiss this tokens on the visitor methods.
the [d]
(d for discard) modifier marks a token as ignored.
[d]
modifier only make sens when applied to a token. If applied to a nonterminal it will simply be ignored.
Here is an exemple for a C block statement:
[Production("block: LBRACKET [d] statement* RBRACKET [d]")]
public AST listElements( List<AST> statements)
{
// any usefull code
}
Sometimes it is easier to define lexemes directly in production rules instead of having to define a lexer.
#> ⚠ this featur only works when used with a Generic Lexer.
Regex Lexer is not supported.
The EBNF syntax allows to explicitely define a token inside a production rule.
This tokens are surrounded by simple quote '
.
Still an enum lexer must be defined for compatibility reasons.
Explicit tokens must be :
- either keyword token (in this case the identifier pattern of the lexer is used (default to alpha))
- or a sugar token
A really simple parser demonstrating the use of explicit tokens :
// the lexer only defines an ID pattern and a double token
// other tokens will be defined explicitely in grammar rules
public enum Lex
{
[AlphaId]
Id,
[Double]
Dbl
}
public class Parse
{
[Production("program : statement*")]
public string Program(List<string> statements)
{
StringBuilder builder = new StringBuilder();
foreach (var statement in statements)
{
builder.AppendLine(statement);
}
return builder.ToString();
}
[Production("statement : Id '='[d] Parse_expressions ")]
public string Assignment(Token<Lex> id, string expression)
{
return $"{id.Value} = {expression}";
}
[Production("condition : Id '=='[d] Parse_expressions ")]
public string Condition(Token<Lex> id, string expression)
{
return $"{id.Value} == {expression}";
}
[Production("statement : 'if'[d] condition 'then'[d] statement 'else'[d] statement")]
public string IfThenElse(string condition, string thenStatement, string elseStatement)
{
StringBuilder builder = new StringBuilder();
builder.AppendLine($"{condition} :");
builder.AppendLine($" - {thenStatement}");
builder.AppendLine($" - {elseStatement}");
return builder.ToString();
}
#region expressions
[Operand]
[Production("operand : Id")]
[Production("operand : Dbl")]
public string Operand(Token<Lex> oper)
{
return oper.Value;
}
[Infix("'+'", Associativity.Left, 10)]
public string Plus(string left, Token<Lex> oper, string right)
{
return $"( {left} + {right} )";
}
[Infix("'*'", Associativity.Left, 20)]
public string Times(string left, Token<Lex> oper, string right)
{
return $"( {left} * {right} )";
}
#endregion
}
```
### under the hood, meta consideration on EBNF parsers ###
The EBNF notation has been implemented in CSLY using the BNF notation. The EBNF parser builder is built using the BNF parser builder. Incidently the EBNF parser builder is a good and complete example for BNF parser : [RuleParser.cs](https://github.com/b3b00/csly/blob/master/sly/parser/generator/RuleParser.cs)
the full grammar for an EBNF rule is [EBNF rules grammar](EBNF-rules-grammar)