-
Notifications
You must be signed in to change notification settings - Fork 35
EBNF parser
An EBNF parser is an extension of a BNF parser. So for better understanding please first refer to BNF parser page
you can use EBNF notation :
- '*' to repeat 0 or more the same terminal or non terminal
- '+' to repeat once or more the same terminal or non terminal
for repeated elements values passed to [Production]
methods are :
-
List<TOut>
for a repeated non terminal -
List<Token<TIn>>
for a repeated terminal
[Production("listElements: value additionalValue*")]
public JSon listElements(JSon head, List<JSon> tail)
{
JList values = new JList(head);
values.AddRange(tail);
return values;
}
See EBNFJsonParser.cs for a complete EBNF json parser.
the '?' modifier allow optional token or non-erminal.
- for tokens the
Token<TIn>
parameter has aIsEmpty
property set totrue
when the matching token is absent. - for nonterminal the visitor method get an
OptionValue<TOut>
instead of TOut. Then the parameter can be tested for emptyness with IsNone property.
//option token
[Production("block: A B? C")]
public AST listElements(Token<TIn>, a Token<TIn> b, Token<TIn> c)
{
if (b.IsEmpty) {
// do something usefull
}
else {
string bValue = b.Value;
// do other thing still usefull
}
}
// optional non terminal
[Production("root2 : a B? c ")]
public string root2(Token<OptionTestToken> a, ValueOption<string> b, Token<OptionTestToken> c)
{
StringBuilder r = new StringBuilder();
r.Append($"R(");
r.Append(a.Value);
r.Append(b.Match(v => $",{v}", () => ",<none>"));
r.Append($",{c.Value}");
r.Append($")");
return r.ToString();
}
You can define groups (also known as sub-rules) in a production rule. A group is a sequence of terminals or non terminals. Modifiers are not allowed within a group, only the discard modifier (see bleow) is allowed on terminals.
the matching method parameter for a group is a Group<TIn,TOut>
. A Group<TIn,TOut>
is a list of Token<IN>
or TOut
. Values in the Group are listed in the same order as their corresping clauses. Group<TIn,TOut>
exposes method to ease access to values.
Groups can be "multiplied" using a modifier. In this case the value returned is a List<Group<TIn,TOut>>
Groups can also be optional using the ?
operator. Then the returned value is a ValueOption<Group<IN,OUT>>
.
[Production("listElements: value (COMMA [d] value)* ")]
public JSon listElements(JSon head, List<Group<JsonToken,JSon>> tail)
{
JList values = new JList(head);
values.AddRange(tail.Select((Group<JsonToken,JSon> group) => group.Value(0)).ToList<JSon>());
return values;
}
[Production("rootOption : A ( SEMICOLON [d] A )? ")]
public string rootOption(Token<GroupTestToken> a, ValueOption<Group<GroupTestToken, string>> option)
{
StringBuilder builder = new StringBuilder();
builder.Append("R(");
builder.Append(a.Value);
var gg = option.Match(
(Group<GroupTestToken, string> group) =>
{
var aToken = group.Token(0).Value;
builder.Append($";{aToken}");
return group;
},
() =>
{
builder.Append(";");
builder.Append("a");
var g = new Group<GroupTestToken, string>();
g.Add("<none>", "<none>");
return g;
});
builder.Append(")");
return builder.ToString();
}
In some case you just don't want to write many production rules when those rules only differ with a single terminal or non terminal clause. For these case you can use the | operator. Alternate choices are grouped together between brackets [ ... ]. a pipe | separate each different choice :
public class AlternateChoiceTestTerminal
{
[Production("choice : [ a | b | c]")]
public string Choice(Token<OptionTestToken> c)
{
return c.Value;
}
}
Caution ! in a same choice group terminal and non terminal can not be mixed.
Sometimes tokens do not bring any semantic value. Their only value is to denotes syntaxic structure.
For example in C like language, brackets ('{') only denotes beginning of blocks but does add any other information. Their only use is to guide the syntax parser. So we proposed a way to dismiss this tokens on the visitor methods.
the [d]
(d for discard) modifier marks a token as ignored.
[d]
modifier only make sens when applied to a token. If applied to a nonterminal it will simply be ignored.
Here is an exemple for a C block statement:
[Production("block: LBRACKET [d] statement* RBRACKET [d]")]
public AST listElements( List<AST> statements)
{
// any usefull code
}
The EBNF notation has been implemented in CSLY using the BNF notation. The EBNF parser builder is built using the BNF parser builder. Incidently the EBNF parser builder is a good and complete example for BNF parser : RuleParser.cs
the full grammar for an EBNF rule is EBNF rules grammar