Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about preamble #65

Closed
Kludex opened this issue Nov 10, 2024 · 6 comments
Closed

Question about preamble #65

Kludex opened this issue Nov 10, 2024 · 6 comments

Comments

@Kludex
Copy link

Kludex commented Nov 10, 2024

Hi @defnull

I'm trying to understand when should we error on some situations.

The situation I have in hands now is this one:

from multipart import PushMultipartParser

parser = PushMultipartParser(b"boundary123")

for r in parser.parse(b"--boundary123\r"):
    ...

for r in parser.parse(b"\n"):
    ...

print(parser._state)

What is the expected state?

I would expect the whole --boundary123\r to be buffered, and then \n to be included on the buffer, and have the state becoming HEADER, but that's not what happens. The buffer is partially consumed - the offset becomes -3 after the first chunk, and 23\r remain on the buffer. Then \n is added.

What do you think? Is this behavior correct?

@defnull
Copy link
Owner

defnull commented Nov 10, 2024

That sounds like a bug I already found and fixed in #64. It's unlikely to trigger because the first chunk is usually bigger than the boundary, but a new test case that feeds single bytes into the parser found it.

@defnull
Copy link
Owner

defnull commented Nov 10, 2024

The PR is now ready and has 100% coverage again. Would you like to give it a try?

@Kludex
Copy link
Author

Kludex commented Nov 10, 2024

I've tested. It does move to HEADER now. Thanks.

I'm interested in creating a test suite language agnostic for multipart. My idea is to have something like:

[[tests]]
name = "preamble_cr_after_delimiter"
data_files = ["preamble_cr_after_delimiter_1.http", "preamble_cr_after_delimiter_2.http]

[tests.expected]
error = false
incomplete = true
# headers = ?
# data = ?

The preamble_cr_after_delimiter_1.http:

--boundary\r

The preamble_cr_after_delimiter_2.http:

--boundary\r\n

What do you think?

@defnull
Copy link
Owner

defnull commented Nov 10, 2024

I like the general idea of machine readable test suite that can maybe even tested against multiple implementations. But I'd prefer a format where you can easily see the entire payload and do not have to open or create multiple files. Repetition is fine in test cases, as long as it makes the individual test more comprehensible.

How about one TOML or YAML file per test (the test name can be derived from the file name) and an array of strings as the payload? Each string is one chunk fed to the parser, so testing chunk borders is also possible. Some magic strings could be expanded to make it easier to write tests for large inputs. For example {xyz * 100} within the payload would be replaced by the string xyz repeated 100 times.

Not sure how to express the expected results, though. For tests that should fully parse, you'd at least need to expect an ordered list of field names, optional filenames, payloads (or at least payload sizes) and maybe content type to make sure that the parser did not miss any.

Error cases are also difficult to model. If the test suite is for a specific parser, it would be nice to expect a specific error code or message. But what if it should be parser-independent? Some parsers do not check for completeness and some are more forgiving than others. For some edge cases there is no 'correct' solution because the RFC sucks. Some parsers break the RFC on purpose.

@defnull
Copy link
Owner

defnull commented Nov 10, 2024

That said, I would really like to get rid of this junk and have all that in separate files. Those are only 'valid' inputs and easier to model. Maybe a good start.

@defnull
Copy link
Owner

defnull commented Nov 11, 2024

PR #64 merged, I'm preparing a release

@defnull defnull closed this as completed Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants