Add option to change error constructor #445

mysteriouslyseeing · 2024-12-02T13:46:48Z

Closes #444.

mysteriouslyseeing · 2024-12-02T14:34:26Z

I'm not sure error_callback is the right name for the attribute. Would error_constructor be more clear?

jeertmans · 2024-12-02T15:52:50Z

Hi @mysteriouslyseeing, thanks for your PR!

What about using a similar syntax to skip:

#[logo(error = <ErrorType>)]`
#[logo(error(<ErrorType> [, callback]))]

?

E.g., with the new syntax:

#[logo(error(Error, callback))]

`#[logos(error = SomeType)]` `#[logos(error_callback = callback)]` is now expressed as `#[logos(error(SomeType, callback))]`

jeertmans · 2024-12-06T17:14:06Z

book/src/attributes/logos.md

@@ -34,7 +34,8 @@ The type `ErrorType` can be any type that implements `Clone`, `PartialEq`,
 `Default` and `From<E>` for each callback's error type.

 `ErrorType` must implement the `Default` trait because invalid tokens, i.e.,
-literals that do not match any variant, will produce `Err(ErrorType::default())`.
+literals that do not match any variant, will produce `Err(ErrorType::default())`,
+unless you provide a callback with the alternate syntax `#[logos(error(ErrorType, callback = ...))]`


Sorry for the late review. Just one last comment: can you add a bit more details about the callback (signature) and maybe link to the example you wrote? Otherwise, users will not know it is documented :-)

Otherwise, it's all good to be merged!

I actually just changed the custom_error example to use the new error syntax instead of adding a new one, and the example is already linked in the book

mysteriouslyseeing · 2024-12-07T15:03:47Z

Just a heads-up that adding a defaulted method to a trait (which this PR does) is listed as a 'possibly-breaking' change in the cargo book's SemVer section.

jeertmans · 2024-12-10T11:20:15Z

Just a heads-up that adding a defaulted method to a trait (which this PR does) is listed as a 'possibly-breaking' change in the cargo book's SemVer section.

Good point, meaning that it should be released under a version that either bumps major or minor part.

jeertmans · 2024-12-10T11:26:45Z

According to CI's benchmarks, this PR results in a notable decrease in performance, but I don't really see why: https://github.com/maciejhirsz/logos/actions/runs/12214084915?pr=445.

mysteriouslyseeing · 2024-12-10T17:10:21Z

I ran the benchmarks on my machine and got this:

group                                         default_before                         default_changes
-----                                         --------------                         ---------------
count_ok/identifiers                          1.19    618.1±2.13ns  1201.8 MB/sec    1.00    519.2±2.93ns  1431.0 MB/sec
count_ok/keywords_operators_and_punctators    1.13  1711.9±11.62ns  1187.2 MB/sec    1.00  1511.9±26.52ns  1344.2 MB/sec
count_ok/strings                              1.00    409.8±1.47ns  2027.0 MB/sec    1.38    565.8±3.83ns  1468.1 MB/sec
iterate/identifiers                           1.14   590.5±11.93ns  1258.2 MB/sec    1.00    516.1±3.76ns  1439.5 MB/sec
iterate/keywords_operators_and_punctators     1.07  1612.6±16.14ns  1260.3 MB/sec    1.00  1501.0±16.17ns  1354.0 MB/sec
iterate/strings                               1.00    410.9±2.81ns  2021.8 MB/sec    1.38    567.5±4.36ns  1463.6 MB/sec

Looks like both strings tests are definitely slower given both the benchmarks. I've got no idea why.

jeertmans · 2024-12-10T17:13:40Z

I ran the benchmarks on my machine and got this:

group                                         default_before                         default_changes
-----                                         --------------                         ---------------
count_ok/identifiers                          1.19    618.1±2.13ns  1201.8 MB/sec    1.00    519.2±2.93ns  1431.0 MB/sec
count_ok/keywords_operators_and_punctators    1.13  1711.9±11.62ns  1187.2 MB/sec    1.00  1511.9±26.52ns  1344.2 MB/sec
count_ok/strings                              1.00    409.8±1.47ns  2027.0 MB/sec    1.38    565.8±3.83ns  1468.1 MB/sec
iterate/identifiers                           1.14   590.5±11.93ns  1258.2 MB/sec    1.00    516.1±3.76ns  1439.5 MB/sec
iterate/keywords_operators_and_punctators     1.07  1612.6±16.14ns  1260.3 MB/sec    1.00  1501.0±16.17ns  1354.0 MB/sec
iterate/strings                               1.00    410.9±2.81ns  2021.8 MB/sec    1.38    567.5±4.36ns  1463.6 MB/sec

Looks like both strings tests are definitely slower given both the benchmarks. I've got no idea why.

Looks like one of the test also increased significantly, that's also weird. I send an e-mail to Logos' author to see if he can enable CodSpeed, which might provide more accurate benchmarks.

mysteriouslyseeing · 2024-12-11T09:30:29Z

The problem is on lines 381 and 385 of src/lexer.rs:

#[cfg(not(feature = "forbid_unsafe"))]
{
    self.token = core::mem::ManuallyDrop::new(Some(Err(Token::make_error(self))));
}
#[cfg(feature = "forbid_unsafe")]
{
    self.token = Some(Err(Token::make_error(self)));
}

Previously, Token::make_error(self) was Token::Error::default().

Maybe the problem is that the compiler can't optimise that like it used to? Setting self.token and creating the error both require references to self now - but surely it should be able to see that the reference isn't actually used?

mysteriouslyseeing · 2024-12-11T09:33:37Z

Here are my benchmarks after reverting that specific line:

group                                         default_before                         default_changes
-----                                         --------------                         ---------------
count_ok/identifiers                          1.00    619.9±5.36ns  1198.5 MB/sec    1.00    618.1±1.59ns  1201.8 MB/sec
count_ok/keywords_operators_and_punctators    1.00  1709.8±10.51ns  1188.6 MB/sec    1.00  1714.4±13.83ns  1185.4 MB/sec
count_ok/strings                              1.00    410.1±0.73ns  2025.7 MB/sec    1.00    409.9±0.89ns  2026.7 MB/sec
iterate/identifiers                           1.01    591.2±4.63ns  1256.5 MB/sec    1.00    587.3±1.76ns  1264.9 MB/sec
iterate/keywords_operators_and_punctators     1.00  1620.6±21.71ns  1254.1 MB/sec    1.00  1613.9±17.77ns  1259.2 MB/sec
iterate/strings                               1.00    409.9±1.82ns  2026.3 MB/sec    1.00   409.2±10.04ns  2030.1 MB/sec

jeertmans

Thanks for your analysis @mysteriouslyseeing! Let's see if hinting the compiler that it should inline the default implementation can help, see my suggestion.

jeertmans · 2024-12-11T10:41:14Z

src/lib.rs

+    fn make_error(_lexer: &mut Lexer<'source, Self>) -> Self::Error {
+        Self::Error::default()
+    }


Suggested change

fn make_error(_lexer: &mut Lexer<'source, Self>) -> Self::Error {

Self::Error::default()

}

#[inline(always)]

fn make_error(_lexer: &mut Lexer<'source, Self>) -> Self::Error {

Self::Error::default()

}

We can probably try this to see if that helps the compiler.

Performance is unchanged, unfortunately:

group default_before default_changes ----- -------------- --------------- count_ok/identifiers 1.20 620.0±18.12ns 1198.2 MB/sec 1.00 518.1±28.66ns 1433.8 MB/sec count_ok/keywords_operators_and_punctators 1.12 1731.6±95.53ns 1173.6 MB/sec 1.00 1552.2±150.95ns 1309.3 MB/sec count_ok/strings 1.00 410.6±6.04ns 2023.2 MB/sec 1.40 574.0±12.94ns 1447.1 MB/sec iterate/identifiers 1.15 594.5±30.62ns 1249.7 MB/sec 1.00 518.0±27.87ns 1434.3 MB/sec iterate/keywords_operators_and_punctators 1.04 1623.9±49.04ns 1251.5 MB/sec 1.00 1559.1±111.50ns 1303.5 MB/sec iterate/strings 1.00 412.3±32.70ns 2014.8 MB/sec 1.39 571.8±16.91ns 1452.7 MB/sec

mysteriouslyseeing · 2024-12-11T13:14:57Z

To get it to work, I changed the signature and default implementation of make_error:

    #[inline(always)]
    #[doc(hidden)]
    fn make_error(lexer: &mut Lexer<'source, Self>) {
        use internal::LexerInternal as _;
        lexer.set_error(Self::Error::default())
    }

This uses a new function on LexerInternal:

    #[inline]
    fn set_error(&mut self, error: Token::Error) {
        #[cfg(not(feature = "forbid_unsafe"))]
        {
            self.token = core::mem::ManuallyDrop::new(Some(Err(error)));
        }
        #[cfg(feature = "forbid_unsafe")]
        {
            self.token = Some(Err(error));
        }
    }

Now the error function on LexerInternal looks like this:

    #[inline]
    fn error(&mut self) {
        self.token_end = self.source.find_boundary(self.token_end);
        Token::make_error(self);
    }

Mixing around the functions like this does make it harder to understand but it does fix the performance regression:

group                                         default_before                         default_changes
-----                                         --------------                         ---------------
count_ok/identifiers                          1.00   620.0±18.12ns  1198.2 MB/sec    1.00    617.1±1.24ns  1203.8 MB/sec
count_ok/keywords_operators_and_punctators    1.03  1731.6±95.53ns  1173.6 MB/sec    1.00   1681.4±9.95ns  1208.7 MB/sec
count_ok/strings                              1.00    410.6±6.04ns  2023.2 MB/sec    1.00    409.9±0.73ns  2026.3 MB/sec
iterate/identifiers                           1.02   594.5±30.62ns  1249.7 MB/sec    1.00    583.4±1.41ns  1273.4 MB/sec
iterate/keywords_operators_and_punctators     1.00  1623.9±49.04ns  1251.5 MB/sec    1.00  1620.8±22.91ns  1253.9 MB/sec
iterate/strings                               1.01   412.3±32.70ns  2014.8 MB/sec    1.00    407.5±2.00ns  2038.4 MB/sec

And here is the comparison between the old and new changes:

group                                         default_changes_old                     default_changes
-----                                         -------------------                     ---------------
count_ok/identifiers                          1.00   518.1±28.66ns  1433.8 MB/sec     1.19    618.0±2.73ns  1202.0 MB/sec
count_ok/keywords_operators_and_punctators    1.00  1552.2±150.95ns  1309.3 MB/sec    1.08   1681.9±9.45ns  1208.3 MB/sec
count_ok/strings                              1.40   574.0±12.94ns  1447.1 MB/sec     1.00    410.3±0.75ns  2024.5 MB/sec
iterate/identifiers                           1.00   518.0±27.87ns  1434.3 MB/sec     1.14    589.9±4.94ns  1259.3 MB/sec
iterate/keywords_operators_and_punctators     1.00  1559.1±111.50ns  1303.5 MB/sec    1.04  1627.9±33.51ns  1248.4 MB/sec
iterate/strings                               1.39   571.8±16.91ns  1452.7 MB/sec     1.00    411.7±4.02ns  2017.7 MB/sec

It would be nice to know how identifiers was faster so we can get both performance increases but that should probably happen in another PR.

Replaced usages with `set`

jeertmans · 2024-12-12T10:44:10Z

Thanks for your efforts @mysteriouslyseeing! I am not sure to understand your two benchmark results: the first is this PR's changes, and the second is the change between last commit and second-last commit?

mysteriouslyseeing · 2024-12-12T10:45:27Z

Thanks for your efforts @mysteriouslyseeing! I am not sure to understand your two benchmark results: the first is this PR's changes, and the second is the change between last commit and second-last commit?

Exactly, yes

jeertmans · 2024-12-12T11:43:33Z

Hum, benchmarks results are, surprising, as they seem to both indicate performance increase and decrease: https://github.com/maciejhirsz/logos/actions/runs/12279903326/attempts/2#summary-34310133559.

mysteriouslyseeing · 2024-12-12T12:59:56Z

Hmmm, that is strange. Looks like it's mostly increases? Identifiers are worse, but only without forbid_unsafe?
Unfortunately I probably can't investigate because my local benchmarks are showing little to no difference in performance.

jeertmans · 2024-12-18T12:41:30Z

Hmmm, that is strange. Looks like it's mostly increases? Identifiers are worse, but only without forbid_unsafe? Unfortunately I probably can't investigate because my local benchmarks are showing little to no difference in performance.

I'll hold this for one or two more weeks, to see if @maciejhirsz can enable CodSpeed (it should provide better instruments for precise benchmarking). After that, I will try to run benchmark locally and see if it is safe to merge :-) Sorry for the delay!

Added initial functionality to error callback

0b53340

mysteriouslyseeing changed the title ~~Added initial functionality to error callback~~ Add option to change error constructor Dec 2, 2024

mysteriouslyseeing added 4 commits December 3, 2024 01:15

Changed Logos::make_error to take a mutable reference

63169b5

Added error_callback usage to custom_error.rs

fe62bcb

changed custom_error test to use error_callback

5d80583

Updated book

2ee7e14

mysteriouslyseeing marked this pull request as ready for review December 2, 2024 14:34

mysteriouslyseeing added 3 commits December 3, 2024 13:15

Removed temporary example

a0a358e

Removed error_callback attr, added to error

582cfcc

`#[logos(error = SomeType)]` `#[logos(error_callback = callback)]` is now expressed as `#[logos(error(SomeType, callback))]`

Edit book to reflect syntax changes

7834597

jeertmans reviewed Dec 6, 2024

View reviewed changes

Added mini example to book

ac220e4

jeertmans reviewed Dec 11, 2024

View reviewed changes

mysteriouslyseeing added 3 commits December 11, 2024 22:58

Added #[inline(always)]

d7b1e7c

Fixed performance regression

227eed4

Added #[doc(hidden)] to Logos::make_error

9835896

mysteriouslyseeing added 2 commits December 12, 2024 02:42

Removed redundant method set_error

ac45573

Replaced usages with `set`

CallbackResults now use Token::make_error

6ae71c4

sokorototo mentioned this pull request Jan 6, 2025

Custom Error Default trait with extra information from the Lexer as parameters #462

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to change error constructor #445

Add option to change error constructor #445

mysteriouslyseeing commented Dec 2, 2024

mysteriouslyseeing commented Dec 2, 2024

jeertmans commented Dec 2, 2024

jeertmans Dec 6, 2024

mysteriouslyseeing Dec 7, 2024

mysteriouslyseeing commented Dec 7, 2024

jeertmans commented Dec 10, 2024

jeertmans commented Dec 10, 2024

mysteriouslyseeing commented Dec 10, 2024

jeertmans commented Dec 10, 2024

mysteriouslyseeing commented Dec 11, 2024

mysteriouslyseeing commented Dec 11, 2024

jeertmans left a comment

jeertmans Dec 11, 2024

mysteriouslyseeing Dec 11, 2024

mysteriouslyseeing commented Dec 11, 2024

jeertmans commented Dec 12, 2024 •

edited

Loading

mysteriouslyseeing commented Dec 12, 2024

jeertmans commented Dec 12, 2024

mysteriouslyseeing commented Dec 12, 2024 •

edited

Loading

jeertmans commented Dec 18, 2024

Add option to change error constructor #445

Are you sure you want to change the base?

Add option to change error constructor #445

Conversation

mysteriouslyseeing commented Dec 2, 2024

mysteriouslyseeing commented Dec 2, 2024

jeertmans commented Dec 2, 2024

jeertmans Dec 6, 2024

Choose a reason for hiding this comment

mysteriouslyseeing Dec 7, 2024

Choose a reason for hiding this comment

mysteriouslyseeing commented Dec 7, 2024

jeertmans commented Dec 10, 2024

jeertmans commented Dec 10, 2024

mysteriouslyseeing commented Dec 10, 2024

jeertmans commented Dec 10, 2024

mysteriouslyseeing commented Dec 11, 2024

mysteriouslyseeing commented Dec 11, 2024

jeertmans left a comment

Choose a reason for hiding this comment

jeertmans Dec 11, 2024

Choose a reason for hiding this comment

mysteriouslyseeing Dec 11, 2024

Choose a reason for hiding this comment

mysteriouslyseeing commented Dec 11, 2024

jeertmans commented Dec 12, 2024 • edited Loading

mysteriouslyseeing commented Dec 12, 2024

jeertmans commented Dec 12, 2024

mysteriouslyseeing commented Dec 12, 2024 • edited Loading

jeertmans commented Dec 18, 2024

jeertmans commented Dec 12, 2024 •

edited

Loading

mysteriouslyseeing commented Dec 12, 2024 •

edited

Loading