Returning more information on partial results and errors #162

tgnottingham · 2024-04-18T23:01:48Z

There are a couple of changes I've implemented locally that enable getting more information from the parser on partial results and errors. Unfortunately, they're all breaking changes. I'm curious if they might be accepted in a later version of httparse, however unlikely that may be.

For context, my use case calls for an incremental parser -- that is, one that can resume parsing from the start of a header field, rather than having to restart from the beginning of the start line when a previous parse only gave a partial result. httparse enables this with its ability to return partial data, but there are some difficult spots.

The following changes would make things easier:

Preserving parsed headers on partial results.

When you parse without initialized headers and get a partial result, the headers array contains all of the parsed headers, but it isn't resized. This isn't ideal, but it's workable -- you can scan for the first empty header name to find the end of the parsed headers.

On the other hand, when you parse with uninitialized headers and get a partial result, the headers array doesn't contain the parsed headers. There's no way to safely retrieve the parsed headers as far as I know.

Ideally, headers would be resized appropriately in both cases, avoiding the need to scan for the empty header name, and enabling retrieval of partial headers with uninitialized headers. I implemented this in Resize headers on partial parse #160, but realized that the nature of the breaking change might be too much to accept.

Obviously, this can be worked around by using initialized headers and the scanning approach.
Preserving parsed headers on errors.

I believe this is the exact same situation as with partial results. I've implemented the same improvements for errors as I did for partial results in Resize headers on partial parse #160, but haven't submitted a PR.
Returning number of parsed bytes on partial results.

When a partial result is returned, it doesn't tell you how many bytes were parsed successfully. This is useful for enabling incremental parsing. In particular, after getting a partial result, you must know where you can resume parsing with parse_headers later on. If httparse returned that information in the partial case, it would make things much easier. It would need to be careful to only return offsets where parsing could be resumed though (i.e. where the next header field should start). I've not tried to implement this yet.

I've worked around this by scanning forward from the partial data returned by httparse until I reach where the next header field should start, but it's prone to failure if httparse and my code don't make the same decisions.

@seanmonstar, what do you think about this? Is there any chance these changes might be accepted in new major version? Or any chance the first two bullets would be accepted without increasing the major version number?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning more information on partial results and errors #162

Returning more information on partial results and errors #162

tgnottingham commented Apr 18, 2024 •

edited

Loading

Returning more information on partial results and errors #162

Returning more information on partial results and errors #162

Comments

tgnottingham commented Apr 18, 2024 • edited Loading

tgnottingham commented Apr 18, 2024 •

edited

Loading