Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning more information on partial results and errors #162

Open
tgnottingham opened this issue Apr 18, 2024 · 0 comments
Open

Returning more information on partial results and errors #162

tgnottingham opened this issue Apr 18, 2024 · 0 comments

Comments

@tgnottingham
Copy link

tgnottingham commented Apr 18, 2024

There are a couple of changes I've implemented locally that enable getting more information from the parser on partial results and errors. Unfortunately, they're all breaking changes. I'm curious if they might be accepted in a later version of httparse, however unlikely that may be.

For context, my use case calls for an incremental parser -- that is, one that can resume parsing from the start of a header field, rather than having to restart from the beginning of the start line when a previous parse only gave a partial result. httparse enables this with its ability to return partial data, but there are some difficult spots.

The following changes would make things easier:

  • Preserving parsed headers on partial results.

    When you parse without initialized headers and get a partial result, the headers array contains all of the parsed headers, but it isn't resized. This isn't ideal, but it's workable -- you can scan for the first empty header name to find the end of the parsed headers.

    On the other hand, when you parse with uninitialized headers and get a partial result, the headers array doesn't contain the parsed headers. There's no way to safely retrieve the parsed headers as far as I know.

    Ideally, headers would be resized appropriately in both cases, avoiding the need to scan for the empty header name, and enabling retrieval of partial headers with uninitialized headers. I implemented this in Resize headers on partial parse #160, but realized that the nature of the breaking change might be too much to accept.

    Obviously, this can be worked around by using initialized headers and the scanning approach.

  • Preserving parsed headers on errors.

    I believe this is the exact same situation as with partial results. I've implemented the same improvements for errors as I did for partial results in Resize headers on partial parse #160, but haven't submitted a PR.

  • Returning number of parsed bytes on partial results.

    When a partial result is returned, it doesn't tell you how many bytes were parsed successfully. This is useful for enabling incremental parsing. In particular, after getting a partial result, you must know where you can resume parsing with parse_headers later on. If httparse returned that information in the partial case, it would make things much easier. It would need to be careful to only return offsets where parsing could be resumed though (i.e. where the next header field should start). I've not tried to implement this yet.

    I've worked around this by scanning forward from the partial data returned by httparse until I reach where the next header field should start, but it's prone to failure if httparse and my code don't make the same decisions.

@seanmonstar, what do you think about this? Is there any chance these changes might be accepted in new major version? Or any chance the first two bullets would be accepted without increasing the major version number?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant