Question about preamble #65

Kludex · 2024-11-10T12:51:33Z

I'm trying to understand when should we error on some situations.

The situation I have in hands now is this one:

from multipart import PushMultipartParser

parser = PushMultipartParser(b"boundary123")

for r in parser.parse(b"--boundary123\r"):
    ...

for r in parser.parse(b"\n"):
    ...

print(parser._state)

What is the expected state?

I would expect the whole --boundary123\r to be buffered, and then \n to be included on the buffer, and have the state becoming HEADER, but that's not what happens. The buffer is partially consumed - the offset becomes -3 after the first chunk, and 23\r remain on the buffer. Then \n is added.

What do you think? Is this behavior correct?

The text was updated successfully, but these errors were encountered:

defnull · 2024-11-10T15:00:36Z

That sounds like a bug I already found and fixed in #64. It's unlikely to trigger because the first chunk is usually bigger than the boundary, but a new test case that feeds single bytes into the parser found it.

defnull · 2024-11-10T18:28:23Z

The PR is now ready and has 100% coverage again. Would you like to give it a try?

Kludex · 2024-11-10T18:52:50Z

I've tested. It does move to HEADER now. Thanks.

I'm interested in creating a test suite language agnostic for multipart. My idea is to have something like:

[[tests]]
name = "preamble_cr_after_delimiter"
data_files = ["preamble_cr_after_delimiter_1.http", "preamble_cr_after_delimiter_2.http]

[tests.expected]
error = false
incomplete = true
# headers = ?
# data = ?

The preamble_cr_after_delimiter_1.http:

--boundary\r

The preamble_cr_after_delimiter_2.http:

--boundary\r\n

What do you think?

defnull · 2024-11-10T19:57:09Z

I like the general idea of machine readable test suite that can maybe even tested against multiple implementations. But I'd prefer a format where you can easily see the entire payload and do not have to open or create multiple files. Repetition is fine in test cases, as long as it makes the individual test more comprehensible.

How about one TOML or YAML file per test (the test name can be derived from the file name) and an array of strings as the payload? Each string is one chunk fed to the parser, so testing chunk borders is also possible. Some magic strings could be expanded to make it easier to write tests for large inputs. For example {xyz * 100} within the payload would be replaced by the string xyz repeated 100 times.

Not sure how to express the expected results, though. For tests that should fully parse, you'd at least need to expect an ordered list of field names, optional filenames, payloads (or at least payload sizes) and maybe content type to make sure that the parser did not miss any.

Error cases are also difficult to model. If the test suite is for a specific parser, it would be nice to expect a specific error code or message. But what if it should be parser-independent? Some parsers do not check for completeness and some are more forgiving than others. For some edge cases there is no 'correct' solution because the RFC sucks. Some parsers break the RFC on purpose.

defnull · 2024-11-10T20:00:58Z

That said, I would really like to get rid of this junk and have all that in separate files. Those are only 'valid' inputs and easier to model. Maybe a good start.

defnull · 2024-11-11T12:09:27Z

PR #64 merged, I'm preparing a release

defnull closed this as completed Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about preamble #65

Question about preamble #65

Kludex commented Nov 10, 2024

defnull commented Nov 10, 2024

defnull commented Nov 10, 2024

Kludex commented Nov 10, 2024

defnull commented Nov 10, 2024

defnull commented Nov 10, 2024

defnull commented Nov 11, 2024

Question about preamble #65

Question about preamble #65

Comments

Kludex commented Nov 10, 2024

defnull commented Nov 10, 2024

defnull commented Nov 10, 2024

Kludex commented Nov 10, 2024

defnull commented Nov 10, 2024

defnull commented Nov 10, 2024

defnull commented Nov 11, 2024