Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add callbacks and priority for #[logos(skip)] #441

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

mysteriouslyseeing
Copy link

@mysteriouslyseeing mysteriouslyseeing commented Nov 29, 2024

Closes #440.

Syntax is same as #[regex] and #[token]:
#[logos(skip(literal[, callback[ = callback]][, priority = priority]))]

Also added an example with usage:

fn main() {
    let mut lexer = Token::lexer("abc 123\nab( |23\nAbc 123");
    let mut out_tokens = Vec::new();

    while let Some(token_result) = lexer.next() {
        if let Ok(token) = token_result {
            out_tokens.push(token);
        } else {
            // Oh no! There was an error!
            eprintln!("Unrecognised character on line {}", lexer.extras.line_num + 1);
        }
    }
}

use logos::{Lexer, Logos, Skip};

#[derive(Debug, Clone, Copy, Default)]
struct Extras {
    line_num: usize,
}

#[derive(Logos, Debug, PartialEq)]
#[logos(skip r"[ \r]")]
#[logos(skip(r"\n", callback = newline_callback, priority = 3))]
#[logos(extras = Extras)]
enum Token {
    #[regex("[a-z]+")]
    Letters,
    #[regex("[0-9]+")]
    Numbers,
}

fn newline_callback(lexer: &mut Lexer<Token>) -> Skip {
    lexer.extras.line_num += 1;
    Skip
}

A lot of the logic is copied and pasted from Definition - maybe it would be better to refactor them both to use common code? Let me know.

Syntax is same as #[regex] and #[token]:
`#[logos(skip(literal, callback = callback, priority = priority))]`
@mysteriouslyseeing mysteriouslyseeing changed the title Callbacks and priority for #[logos(skip)] added Add callbacks and priority for #[logos(skip)] Nov 29, 2024
@jeertmans
Copy link
Collaborator

Hi @mysteriouslyseeing, it looks great, thanks for your contribution!

Before I review this in depth, could you please:

  • add unit tests to check that all syntax variations are supported;
  • add documentation to the book so that people will know this feature is supported (you can also link to the example file you wrote)?

Thanks!

@jeertmans jeertmans added enhancement New feature or request book Related to the book labels Nov 30, 2024
Copy link
Collaborator

@jeertmans jeertmans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests and book docs!

Before I review other parts of the code, please check my first comments about the failing test :-)

Comment on lines 238 to 257
fn skip_comment<'src>(lexer: &mut Lexer<'src, Token<'src>>) {
let end = lexer
.remainder()
.find("-->")
.unwrap_or(lexer.remainder().len());
lexer.bump(end + 3);
}

#[test]
fn skip_callback_function() {
let mut lexer = Token::lexer("<foo> <!-- comment --> <bar>");
assert_eq!(lexer.next(), Some(Ok(Token::Tag("foo"))));
assert_eq!(lexer.next(), Some(Ok(Token::Tag("bar"))));
assert_eq!(lexer.next(), None);

let mut lexer = Token::lexer("<foo> <!-- unterminated comment");
assert_eq!(lexer.next(), Some(Ok(Token::Tag("foo"))));
// Errors not allowed from skips
assert_eq!(lexer.next(), None);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing, I think this is because you cannot bump end + 3 if the comment is unterminated. The callback logic should only pay attention to this edge case (which you hopefully tested here :-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I forgot to run cargo test in ./test. My bad, I'll fix that now.

Comment on lines 255 to 256
// Errors not allowed from skips
assert_eq!(lexer.next(), None);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the error type is never returned? I am not sure to understand this comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was to do with the other branch, where skips could return errors. I've removed that line

let end = lexer
.remainder()
.find("-->")
.unwrap_or(lexer.remainder().len());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line with other comment, you can probably fix the text by changing this to .map(|id| id + 3).unwrap_or(lexer.remainer().len()); and bump(end).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mysteriouslyseeing
Copy link
Author

When I run cargo test in the ./test subdirectory, some doctests fail. They don't appear to have anything to do with my changes. Is that something to do with me?

@mysteriouslyseeing
Copy link
Author

mysteriouslyseeing commented Dec 2, 2024

To clarify, Clippy complains that

pub enum Callback {
    Label(TokenStream),
    Inline(Box<InlineCallback>),
    SkipWithCallback(SkipCallback),
    Skip(Span),
}

the SkipWithCallback variant ends with Callback, which is the type's name.

Now the enum is defined like so:

pub enum Callback {
    Label(TokenStream),
    Inline(Box<InlineCallback>),
    Skip(SkipCallback), // note these are swapped
    SkipEmpty(Span), // note these are swapped
}

which could be confusing to people previously familiar with the codebase. I've committed it like this, but if you'd rather revert to 294a7ab (before any of the switching around of the variants) that's totally fine by me.

@jeertmans
Copy link
Collaborator

be confusing to people previously familiar with the codebase. I've committed it like this, b

I think this is better to avoid breaking changes when possible (and changing the name of an enum variant is a breaking change), so I prefer reverting to previous name (and IMO it conveys the functionality better), and disable Clippy warning for this variant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
book Related to the book enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add callback to skip
2 participants