-
-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for lack of word boundaries #415
Comments
Hello @ceronman, thank you for this interesting question! I think how you did is good, but I just suggest another way: define the whole grammar with tokens/regexes, and let the lexer return |
@jeertmans I think @ceronman's question is how you can make sure that an input like |
Well @martinstuder another solution is to have a lexer with tokens for every possible case you want to cover, and you can use callbacks if you want to return errors instead of tokens. Here, in this example, you need to explicitly write a token that matches wrong identifier. This issue is that you then need to handle special cases, like strings or code comments, that usually require very complex regexes if you want to only rely on tokens. |
Hi,
This is just a question, not a bug report. I'm using Logos to write a lexer for the C programming language. I'm following the book "Writing a C Compiler" by Nora Sandler.
Something unusual from the book, is that the lexical grammar of the lexer uses word boundaries. So for example there are identifiers and constants:
Now the reason for using word boundaries is an input like
123foo
should be a lexical error. I find this a bit unusual, because I think this kind of error is usually caught at the parser level, but here we're trying to catch it at lexer level.Using Logos I initially had something like this:
But this of course has the problem that it would not catch the error mentioned above. I'm wondering what could be the best workaround for this use case. Currently, what I'm doing is the following:
This kinda works, but it looks a bit ugly to me. I'm using a callback that returns None to signal the error. I'm wondering if there is a better way to do that. I saw some examples using
#[error]
, but it seems that doesn't work in newer versions of Logos.Are there any better ideas?
Thanks in advance.
The text was updated successfully, but these errors were encountered: