-
Notifications
You must be signed in to change notification settings - Fork 35
lexer fluent api
The fluent lexer builder still needs an enum
to declare the different tokens. But this enum
does not need to be decorared with C# attributes. Instead a chain of API calls defines the different tokens.
As the purpose of a fluent API is to allow fluent use, we will define every fluent method in detail.
The lexer fluent builder is declared as follow :
var lexerBuilder = FluentLexerBuilder<MyToken>.NewBuilder();
First we can configure the lexer with some fluent methods :
-
IgnoreEol(bool ignore)
: ignore EOL if parameter istrue
-
IgnoreWhiteSpace(bool ignore)
: ignore white spaces if parameter istrue
-
IsIndentationAware(bool ignore)
: insert INDENT and UINDENT tokens to manage indentation aware language if parameter istrue
. See Indented languages -
IgnoreKeywordCase(bool ignore)
: ignore keywords casing if parameter istrue
-
WithCallBack(IN tokenId, Func<Token<IN>, Token<IN>> callback)
: defines a callback to be called when a token tokenId is scanned. See Generic Lexer Callbacks. -
UseLexerPostProcessor(LexerPostProcess<IN> lexerPostProcessor)
: defines a lexer post processor to be called when tokenization ends. See Lexer post processing -
UseExtensionBuilder(Action<IN, LexemeAttribute, GenericLexer<IN>> extensionBuilder)
: defines an lambda used to extend the generic lexer. See extending the generic lexer
Here is the list of the methods used to define tokens. See Generic Lexer for more detail on generic tokens.
-
AlphaNumId(IN tokenId)
-
AlphaId(IN tokenId)
-
AlphaNumDashId(IN tokenId)
-
CustomId(IN tokenId, string start, string end)
-
Double(IN tokenId, string decimalDelimiter )
-
Int(IN tokenId)
-
Integer(IN tokenId)
-
Sugar(IN tokenId, string token)
-
Date(IN tokenId, DateFormat format, char separator)
-
Hexa(IN tokenId, string prefix)
-
Keyword(IN tokenId, string token)
-
Keyword(IN tokenId, string[] tokens)
-
String(IN tokenId, string delimiter ="\"", string escapeChar = "\\")
-
Character(IN tokenId, string delimiter ="'", string escapeChar = "\\")
-
SingleLineComment(IN tokenId, string start, bool doNotIgnore = false)
-
MultiLineComment(IN tokenId, string start, string end, bool doNotIgnore = false)
-
UpTo(IN tokenId, string pattern)
-
UpTo(IN tokenId, params string[] patterns)
-
Extension(IN tokenId)
-
Regex(IN tokenId, string regex, bool isSkippable = false, bool isEol = false)
: defines d regex token. This can not be used together with generic tokens.
After every
-
WithLabel(string lang, string label)
-
WithLabels(params (string lang, string label)[] labels)
-
WithModes(params string[] modes)
-
OnChannel(int channel)
-
PopMode()
-
PushToMode(string targetMode)
To get the final ILexer<IN>
object you simply call the Build(string lang)
method (the lang parameter states the i18n configuration for erro r messages)
var lexerBuilder = FluentLexerBuilder<MyToken>.NewBuilder();
// .... token definitions
BuildResult<ILexer<MyToken>> lexerResult = lexerBuilder.Build("en");
// check if lexerResult is OK
if (lexerResult.IsOk()) {
ILexer<MyToken> lexer = lexerResult.Result;
}
public enum MyToken {
ID,
INT,
COMMENT,
DOLLAR,
MONEY,
EURO
}
public class FluentLexerTest
{
public void LexerTest()
{
var lexerBuilder = FluentLexerBuilder<MyToken>.NewBuilder();
BuildResult<ILexer<MyToken>> lexerResult = lexerBuilder.AlphaId(MyToken.ID)
.IgnoreEol(true) // ignore end of lines
.IgnoreWhiteSpace(true) // ignore white spaces
.Int(MyToken.INT)
.SingleLineComment(MyToken.COMMENT, "#").OnChannel(Channels.Main) // brings back comment on main channel
.Sugar(MyToken.DOLLAR,"$").PushToMode("money") // $ opens mode money
.UpTo(MyToken.MONEY, "€").WithModes("money")
.Sugar(MyToken.EURO, "€").WithModes("money").PopMode() // € closes modes money
.Build("en");
if (lexerResult.IsOk)
{
ILexer<MyToken> lexer = lexerResult.Result;
lexer.Tokenize(@"
identifier 42
# comment
$ money content €
");
}
}
}