-
I'm currently trying to write a lexer/parser for a custom script language. However, identifiers may contain alphanumeric characters (but not starting with a number) as well as "_", ".", ":" and a number of other characters. Is there any way to extend the parsing logic for the EDIT: probably I'd need to write a generic lexer extension then? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hello @jbartlau, There are 2 ways to get it done.
For you definition it would look like this : public enum Discussion468Lexer
{
// starting with a letter or '_' , '.' or ':'
// next the same chars plus digits
[CustomId("_.:a-zA-Z","_.:0-9a-zA-Z")]
INDENTIFIER
} here is a simple test case public class Discussion468Tests
{
private static void CheckId(ILexer<Discussion468Lexer> lexer,string source, bool isOk)
{
var lexingResult = lexer.Tokenize(source);
if (isOk)
{
Check.That(lexingResult).IsOkLexing();
var tokens = lexingResult.Tokens;
Check.That(tokens).IsNotNull();
Check.That(tokens).CountIs(2);
var idToken = tokens[0];
Check.That(idToken.TokenID).IsEqualTo(Discussion468Lexer.INDENTIFIER);
Check.That(idToken.Value).IsEqualTo(source);
}
else
{
Check.That(lexingResult).Not.IsOkLexing();
}
}
[Fact]
public static void Test468()
{
var buildResult = LexerBuilder.BuildLexer<Discussion468Lexer>();
Check.That(buildResult).IsOk();
var lexer = buildResult.Result;
Check.That(lexer).IsNotNull();
CheckId(lexer,"_:word1234",true);
CheckId(lexer,"word_1234.otherword",true);
CheckId(lexer,"1_:word1234",false);
}
} you can find the lexer and test case on commit ab0e2c4 |
Beta Was this translation helpful? Give feedback.
hello @jbartlau , there is no regex matching. In fact the lexer builder parse the pattern and build an FSM as you would with a lexer extension. So there is no particular perfomance concern.
Regarding your experiment with lexer extension, I agree that is not the easier part of generic lexer. You've just tried to reuse the existing FSM for your need but you have to write your own FSM from scratch to match your pattern.
I've commited a sample with test cases. see 71145ac
I would recommend that you use the customId instead of an extension that is not that easy to understand.