Skip to content
b3b00 edited this page Dec 24, 2024 · 3 revisions

Fluent lexer builder

The fluent lexer builder still needs an enum to declare the different tokens. But this enum does not need to be decorared with C# attributes. Instead a chain of API calls defines the different tokens.

As the purpose of a fluent API is to allow fluent use, we will define every fluent method in detail.

The lexer fluent builder is declared as follow :

var lexerBuilder = FluentLexerBuilder<MyToken>.NewBuilder();

lexer configuration

First we can configure the lexer with some fluent methods :

  • IgnoreEol(bool ignore) : ignore EOL if parameter is true
  • IgnoreWhiteSpace(bool ignore) : ignore white spaces if parameter is true
  • IsIndentationAware(bool ignore) : insert INDENT and UINDENT tokens to manage indentation aware language if parameter is true. See Indented languages
  • IgnoreKeywordCase(bool ignore) : ignore keywords casing if parameter is true
  • WithCallBack(IN tokenId, Func<Token<IN>, Token<IN>> callback) : defines a callback to be called when a token tokenId is scanned. See Generic Lexer Callbacks.
  • UseLexerPostProcessor(LexerPostProcess<IN> lexerPostProcessor) : defines a lexer post processor to be called when tokenization ends. See Lexer post processing
  • UseExtensionBuilder(Action<IN, LexemeAttribute, GenericLexer<IN>> extensionBuilder) : defines an lambda used to extend the generic lexer. See extending the generic lexer

token definitions

Here is the list of the methods used to define tokens. See Generic Lexer for more detail on generic tokens.

  • AlphaNumId(IN tokenId)

  • AlphaId(IN tokenId)

  • AlphaNumDashId(IN tokenId)

  • CustomId(IN tokenId, string start, string end)

  • Double(IN tokenId, string decimalDelimiter )

  • Int(IN tokenId)

  • Integer(IN tokenId)

  • Sugar(IN tokenId, string token)

  • Date(IN tokenId, DateFormat format, char separator)

  • Hexa(IN tokenId, string prefix)

  • Keyword(IN tokenId, string token)

  • Keyword(IN tokenId, string[] tokens)

  • String(IN tokenId, string delimiter ="\"", string escapeChar = "\\")

  • Character(IN tokenId, string delimiter ="'", string escapeChar = "\\")

  • SingleLineComment(IN tokenId, string start, bool doNotIgnore = false)

  • MultiLineComment(IN tokenId, string start, string end, bool doNotIgnore = false)

  • UpTo(IN tokenId, string pattern)

  • UpTo(IN tokenId, params string[] patterns)

  • Extension(IN tokenId)

  • Regex(IN tokenId, string regex, bool isSkippable = false, bool isEol = false) : defines d regex token. This can not be used together with generic tokens.

Labels , Channels and modes

After every

  • WithLabel(string lang, string label)

  • WithLabels(params (string lang, string label)[] labels)

  • WithModes(params string[] modes)

  • OnChannel(int channel)

  • PopMode()

  • PushToMode(string targetMode)

building the lexer

To get the final ILexer<IN> object you simply call the Build(string lang)method (the lang parameter states the i18n configuration for erro r messages)

var lexerBuilder = FluentLexerBuilder<MyToken>.NewBuilder();
// .... token definitions
BuildResult<ILexer<MyToken>> lexerResult = lexerBuilder.Build("en");
// check if lexerResult is OK
if (lexerResult.IsOk()) {
    ILexer<MyToken> lexer = lexerResult.Result;
}

a simple fluently defined lexer

public enum MyToken {
    ID,
    INT,
    COMMENT,
    DOLLAR,
    MONEY,
    EURO
}

public class FluentLexerTest
{
    public void LexerTest()
    {

        var lexerBuilder = FluentLexerBuilder<MyToken>.NewBuilder();

        BuildResult<ILexer<MyToken>> lexerResult = lexerBuilder.AlphaId(MyToken.ID)
            .IgnoreEol(true)   // ignore end of lines
            .IgnoreWhiteSpace(true) // ignore white spaces
            .Int(MyToken.INT)
            .SingleLineComment(MyToken.COMMENT, "#").OnChannel(Channels.Main) // brings back comment on main channel
            .Sugar(MyToken.DOLLAR,"$").PushToMode("money") // $ opens mode money
            .UpTo(MyToken.MONEY, "€").WithModes("money")
            .Sugar(MyToken.EURO, "€").WithModes("money").PopMode() // € closes modes money
            .Build("en");
        if (lexerResult.IsOk)
        {
            ILexer<MyToken> lexer = lexerResult.Result;
            lexer.Tokenize(@"
    identifier 42
# comment
$ money content € 
");
        }
    }

}