Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer error in string #158

Open
jamesdbrock opened this issue Mar 22, 2022 · 9 comments
Open

Defer error in string #158

jamesdbrock opened this issue Mar 22, 2022 · 9 comments

Comments

@jamesdbrock
Copy link
Member

string is potentially slow on failure because of the show call to the input string. Failure cases are hit very often, and if you are using string a lot this can be significant overhead. Deferred errors might be better in general.

Originally posted by @natefaubion in #144 (comment)

@jamesdbrock
Copy link
Member Author

This is a benchmark for failing a string parser 10,000 times. The second measurement is for a string parser modified so that it fails with a constant string error message.

Current situationConstant error string
runParser manyTill anyChar (string "x") 10000
mean   = 8.12 ms
stddev = 736.03 μs
min    = 7.52 ms
max    = 12.80 ms
runParser manyTill anyChar (string "x") 10000
mean   = 6.11 ms
stddev = 794.22 μs
min    = 5.49 ms
max    = 11.16 ms

@jamesdbrock
Copy link
Member Author

I think that deferring error message construction in the case of manyTill anyChar (string "x") would require us the change the type of

data ParseError = ParseError String Position

to

 data ParseError = ParseError (Unit -> String) Position 

?

@jamesdbrock
Copy link
Member Author

Also I think we don't even want the error functions to prepend "Expected " to everything, and we should remove that anyway.

withErrorMessage p msg = p <|> fail ("Expected " <> msg)

@jamesdbrock
Copy link
Member Author

And while we're at it should we accumulate a NonEmptyList of ParseErrors?

@chtenb
Copy link
Member

chtenb commented May 12, 2022

In what situation would there be more than 1 parser error? Or do you want to track all the backtracking decisions in that error list?

@natefaubion
Copy link
Contributor

natefaubion commented May 12, 2022

It's not clear to me there is value in having multiple errors with the current approach to error handling and recovery (or lack thereof). The point of the current "consumed" check is that it's a heuristic for generating a single, specific error. If you wanted to have multiple errors, you would want to remove the check, always backtrack, and then let the user decide which is most specific (potentially by how far it progressed). I personally don't think it's worth doing in this library without having a clear idea of what you wanted to accomplish and enable that is better than the status quo.

@natefaubion
Copy link
Contributor

Note, a "deferred" error doesn't necessarily need to be a thunk. It could potentially just be a structured data type that is constant-time to construct, and potentially only applies escaping rules when rendered.

@jamesdbrock
Copy link
Member Author

In what situation would there be more than 1 parser error?

It's not clear to me there is value in having multiple errors

Yeah, let's not do multiple errors.

Note, a "deferred" error doesn't necessarily need to be a thunk. It could potentially just be a structured data type that is constant-time to construct, and potentially only applies escaping rules when rendered.

What would that look like?

@natefaubion
Copy link
Contributor

Maybe something like:

data ParseError
  = UnexpectedInput { inputExpected :: Array String, inputSeen :: String }
  | ...
  | EmptyAlternative

I'm not sure what other data types would be there, but you might look at what kind of errors we throw and see if it makes sense for others to throw them. The most common is invariably unexpected input errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants