Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PM-1920] Integrity logic when doing 3rd party staging #2033

Open
mayani opened this issue Dec 14, 2024 · 0 comments
Open

[PM-1920] Integrity logic when doing 3rd party staging #2033

mayani opened this issue Dec 14, 2024 · 0 comments
Assignees
Labels
major Major loss of function. sync-from-jira Synced from Jira

Comments

@mayani
Copy link
Member

mayani commented Dec 14, 2024

If you have a nonsharedfs workflow with inputs on for example s3://, and your intermediate storage is s3://, Pegasus will add a stagein job and pegasus-transfer will do it as a 3rd party transfer. Example:

/usr/bin/pegasus-s3 cp -f -c s3://aws/input/foo.txt s3://aws/intermediate/foo.txt

The problem is that the planner assumes that it will get checksums for those files, while pegasus-transfer can't generate the checksum as the file never touched the submit host. Subsequent jobs will fail due to missing file checksums in the passed meta files.

One solution would be to make the planner 3rd-party transfer aware, but that decision is usually a runtime decision by pegasus-transfer.

Another solution would be to make the checksums optional and skip integrity checking if there are missing - I think we discussed this in the past, but can't remember what we decided.

A third option would be for pegasus-transfer to pull down a copy and introduce the checksum, but that would negate the benefits of 3rd party transfers.

Reporter: @rynge
Watchers:
@rynge

@mayani mayani changed the title PM-1920 [PM-1920] Integrity logic when doing 3rd party staging Dec 14, 2024
@mayani mayani added the sync-from-jira Synced from Jira label Dec 14, 2024
@mayani mayani added the major Major loss of function. label Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major Major loss of function. sync-from-jira Synced from Jira
Projects
None yet
Development

No branches or pull requests

2 participants