Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SAT-22998] Ensure Pulp closes the connection on corrupted streamed content #6026

Merged

Conversation

pedro-psb
Copy link
Member

@pedro-psb pedro-psb commented Nov 14, 2024

Because of a design choice of streaming chunk-by-chunk as we receive it from the Remote (to minimize response time and allow only-stream workflow), as soon as we receive the first chunk, all http headers are already sent and we can't roll that back anymore (for example, set a 404).

We could let the user find out about that, but we prefer to prevent it from unecessarily saving/validation something we know is wrong. Because of that, we choose to close the TCP connection and raise a more helpful message on Pulp logs targeted at admins.

fixes #5012

@pedro-psb pedro-psb force-pushed the close-connection-on-digest-validaton-error branch 4 times, most recently from 239643d to 8f19245 Compare November 19, 2024 21:36
@pedro-psb pedro-psb marked this pull request as ready for review November 19, 2024 21:37
Comment on lines 86 to 87
def _write_3_iso_file_fixture_data_factory(name, exist_ok=False):
file_fixtures_root.joinpath(name).mkdir(exist_ok=exist_ok)
Copy link
Member

@mdellweg mdellweg Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this?
Since we are writing random files with fixed names to that path it's probably not ok if it exists and is used as fixtures for e.g. another test at the same time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that on my test I want to change the content/digest of the files after it was synced.
I was assuming each test would have it's own tmpdir namespace, but I've not doubled checked that. If that's not true, then I agree with you that this is problematic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see, you want to represent a remote that changes it's content. Maybe we can make that more intuitive.
The whole interdependence of these fixture servers is already hard to dissect.

@pedro-psb pedro-psb force-pushed the close-connection-on-digest-validaton-error branch from 8f19245 to 1b43d4b Compare November 21, 2024 12:48
@pedro-psb
Copy link
Member Author

@dralley Can you also have a look?

@pedro-psb pedro-psb changed the title Ensure Pulp closes the connection on corrupted streamed content [SAT-22998] Ensure Pulp closes the connection on corrupted streamed content Nov 21, 2024
expected_file_list = list(get_files_in_manifest(remote.url))

# Overwrite files in remote_server (change digest but keep name)
basic_manifest_path_overwrite()
Copy link
Member

@mdellweg mdellweg Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
basic_manifest_path_overwrite()
write_3_iso_file_fixture_data_factory(overwrite=True)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that in a new fixup commit with related changes

@pedro-psb pedro-psb force-pushed the close-connection-on-digest-validaton-error branch 2 times, most recently from 23b1247 to 9fe9421 Compare November 21, 2024 19:12
@dralley
Copy link
Contributor

dralley commented Nov 22, 2024

This looks reasonable for the closing-the-connection approach. Could you update the issue with the rationale behind going w/ that approach over allowing the client to get a checksum failure?

There are pros and cons listed but the actual reason for choosing this option isn't clear from the description.

Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review focuses solely on comments (and docstrings), because the rest looks good to me.
Comments are a delicate matter. Usually, it's best to avoid them altogether.

download_result = await downloader.run()
except DigestValidationError:
# Cant recover from wrong data already sent.
# We should close the connection without sending an EOF in the response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't wash the message up with words like "should"... We decided we do it at this point. Also maybe we don't need this comment at all. The whole text is covered by the RuntimeError in the same codeblock.

Suggested change
# We should close the connection without sending an EOF in the response
# We close the connection without sending an EOF in the response.

Comment on lines 1179 to 1182
# Another possibility is configure the socket to send a RST instead of FIN,
# but I'm not sure if that's required:
# https://serverfault.com/questions/242302/use-of-tcp-fin-and-tcp-rst
# sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A road we didn't take? I don't think anyone is ever going to read this comment.

If the "too obvious alternative" had not worked, there would be another story here.

Comment on lines 124 to 139
# Create a remote that points to a file repository with 3 iso files
basic_manifest_path = write_3_iso_file_fixture_data_factory("basic")
remote = file_remote_ssl_factory(manifest_path=basic_manifest_path, policy="on_demand")

# Sync from the remote
body = RepositorySyncURL(remote=remote.pulp_href)
monitor_task(
file_bindings.RepositoriesFileApi.sync(file_repo_with_auto_publish.pulp_href, body).task
)
repo = file_bindings.RepositoriesFileApi.read(file_repo_with_auto_publish.pulp_href)

# Create a distribution from the publication
distribution = file_distribution_factory(repository=repo.pulp_href)

# Download the manifest from the remote
expected_file_list = list(get_files_in_manifest(remote.url))

# Overwrite the 3 iso files to have different content/digest
write_3_iso_file_fixture_data_factory("basic", overwrite=True)

# Assert response is incomplete
get_url = urljoin(distribution.base_url, expected_file_list[0][0])
with pytest.raises(ClientPayloadError, match="Response payload is not completed"):
download_file(get_url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the comments here are superfluous.
e.g.: if "expected_file_list = list(get_files_in_manifest(remote.url))" didn't tell you what it does, maybe get_files_in_manifest should be called download_manifest.

Instead, you could add where GIVEN, WHEN and THEN start.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for some context, most of these steps (and their comments) are copied over from other tests in this file, so I was just following the pattern. But I'm ok with this change, I agree its more meaningful to structure tests that way.

@pedro-psb pedro-psb force-pushed the close-connection-on-digest-validaton-error branch from a0e22f2 to 3df2b0a Compare November 22, 2024 14:10
@pedro-psb
Copy link
Member Author

This looks reasonable for the closing-the-connection approach. Could you update the issue with the rationale behind going w/ that approach over allowing the client to get a checksum failure?

There are pros and cons listed but the actual reason for choosing this option isn't clear from the description.

#5012 (comment)

@pedro-psb pedro-psb force-pushed the close-connection-on-digest-validaton-error branch from 3df2b0a to 085a459 Compare November 22, 2024 14:22
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this looks ready to go. Can you squash the commits before we merge?

CHANGES/5012.bugfix Outdated Show resolved Hide resolved
Assuming we want to keep our stream-redirect approach on the
content-app, We cant recover from wrong data already sent if
the Remote happens to be corrupted (contains wrong binaries).

In order to not give a 200 reponse to client, we decided to
close the connection as soon as the request handler realizes
the checksum is wrong.

That only happens after we already sent the whole blob minus EOF,
so we close the connection before sending the EOF.

Additionally, we put some message on the logs for admins to see
and have a chance to manually fix the remote/remote_artifacts.

Co-authored-by: Matthias Dellweg <[email protected]>

fixes pulp#5012
@pedro-psb pedro-psb merged commit 5801bf9 into pulp:main Nov 25, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

content-app will fully stream a file to the user before doing any checksum verification
3 participants