Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add callback to skip #440

Open
mysteriouslyseeing opened this issue Nov 29, 2024 · 9 comments · May be fixed by #441
Open

Add callback to skip #440

mysteriouslyseeing opened this issue Nov 29, 2024 · 9 comments · May be fixed by #441
Labels
enhancement New feature or request nice to have

Comments

@mysteriouslyseeing
Copy link

Is it possible to add a callback to skip?

This would be useful if you are trying to count newlines, and you also want to skip them:

use logos::{Lexer, Logos, Skip};

#[derive(Debug, Clone, Copy, Default)]
struct Extras {
    line_num: usize,
}

#[derive(Logos)]
#[logos(skip r" \r")]
// Is there a way of adding a callback here?
#[logos(skip(regex = r"\n", callback = newline_callback))]
#[logos(extras = Extras)]
enum Token {
    #[regex("[a-zA-Z]", |lexer| { lexer.slice().chars().next().unwrap() })]
    Letter(char),
    #[regex("[0-9]", |lexer| { lexer.slice().parse::<u8>().unwrap() })]
    Number(u8),
}

fn newline_callback(lexer: &mut Lexer<Token>) -> Skip {
    lexer.extras.line_num += 1;
    Skip
}

I know you could add another variant like Token::Ignored, or attach the newline regex to another variant, but both those solutions feel messy, and it'd be nice to have all the skip logic in one place.

@jeertmans
Copy link
Collaborator

Hi @mysteriouslyseeing!

As you said, this is totally possible to do this, but you need to do that with an enum variant.

If you have some time, you can try adding support for a callback for skip, this shouldn't be hard.

All you need (I think), is to modify the parser to also read optional callbacks

("skip", |parser, span, value| match value {
NestedValue::Literal(lit) => {
if let Some(literal) = parser.parse_literal(Lit::new(lit)) {
parser.skips.push(literal);
}
}
_ => {
parser.err("Expected: #[logos(skip \"regex literal\")]", span);
}

You can take inspiration for what is done for parsing token / regex definitions:

/// Parse attribute definition of a token:
///
/// + `#[token(literal[, callback])]`
/// + `#[regex(literal[, callback])]`
pub fn parse_definition(&mut self, attr: &mut Attribute) -> Option<Definition> {
let mut nested = self.parse_attr(attr)?;
let literal = match nested.parsed::<Lit>()? {
Ok(lit) => self.parse_literal(lit)?,
Err(err) => {
self.err(err.to_string(), err.span());
return None;
}
};
let mut def = Definition::new(literal);
for (position, next) in nested.enumerate() {
match next {
Nested::Unexpected(tokens) => {
self.err("Unexpected token in attribute", tokens.span());
}
Nested::Unnamed(tokens) => match position {
0 => def.callback = self.parse_callback(tokens),
_ => {
self.err(
"\
Expected a named argument at this position\n\
\n\
hint: If you are trying to define a callback here use: callback = ...\
",
tokens.span(),
);
}
},
Nested::Named(name, value) => {
def.named_attr(name, value, self);
}
}
}
Some(def)
}

@jeertmans jeertmans added enhancement New feature or request nice to have labels Nov 29, 2024
@mysteriouslyseeing
Copy link
Author

mysteriouslyseeing commented Nov 29, 2024

I've had a crack at it here

@mysteriouslyseeing
Copy link
Author

Looking at erroring, it seems a bit odd that there isn't a callback for the case when no match was found. I think it just uses Error::default(), when it would be nice if the error could gather information from the lexer, such as span and line number. Maybe a #[logos(error_callback = callback)] would also make sense?

@mysteriouslyseeing
Copy link
Author

I've done an implementation of the above here

@jeertmans
Copy link
Collaborator

I've done an implementation of the above here

This is a branch, do you have a concrete example I could read? :-)

@mysteriouslyseeing
Copy link
Author

mysteriouslyseeing commented Nov 30, 2024

Sure, this is an example a possible error syntax:

fn main() {
    let input = "1 * 2 - 8 % 4";
    for res in Token::lexer(input) {
        match res {
            Ok(_) => (),
            Err(e) => eprintln!("{e}"),
        }
    }
}

use std::{fmt::Display, ops::Range};


use logos::Logos;

#[derive(Debug, Clone, Default, PartialEq)]
struct Error {
    message: String,
    index: Range<usize>,
}

impl Display for Error {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!{f,
            "Error at {:?}: {}",
            self.index,
            self.message,
        }
    }
}

#[derive(Logos, Debug)]
#[logos(error = Error)]
#[logos(error_callback = |lex| Error {
    message: format!("Invalid character '{}'", lex.slice()),
    index: lex.span(),
})]
#[logos(skip(r" "))]
enum Token {
    #[regex(r"[\*/+-]")]
    Operator,
    #[regex("[0-9]+")]
    Number
}

which prints Error at 10..11: Invalid character '%'

That example without the enhancement, while not impossible to accomplish, is significantly more clunky - you have to call Iterator::next manually so you don't consume the lexer.

@jeertmans
Copy link
Collaborator

Ok this makes sense, I think this would be a great new feature!

@mysteriouslyseeing
Copy link
Author

Should I make a new issue/pull request for this?

@jeertmans
Copy link
Collaborator

Should I make a new issue/pull request for this?

Yes please :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request nice to have
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants