Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a node operator, I want to upload to Registry without downloading data from s3 #349

Open
rgdeen opened this issue Nov 22, 2024 · 0 comments
Assignees
Labels
needs:triage requirement the current issue is a requirement

Comments

@rgdeen
Copy link

rgdeen commented Nov 22, 2024

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

node operator - those putting data in the Registry

πŸ’ͺ Motivation

For our high-volume missions, data comes to us (IMG) from the data provider in s3. We never have a complete copy on disk anywhere. Validations are done piecewise on a KDP cluster. We do an s3-to-s3 transfer to the public bucket, where it needs to be registered.

Currently, all the data must be downloaded somewhere, which is problematic for 10TB deliveries.

Downloading the labels is tractable, but the data isn't. The data in s3 has (or can have) an rclone-style md5 checksum which can be retrieved, which should obviate the need for the data itself.

Yes we can download piecewise, but that's just that many more steps that could go wrong and risks missing things (for example I would never trust the KDP piecewise processing as it has been proven to be unreliable).

πŸ“– Additional Details

No response

Acceptance Criteria

Given
When I perform
Then I expect

βš™οΈ Engineering Details

No response

πŸŽ‰ I&T

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:triage requirement the current issue is a requirement
Projects
Status: ToDo
Development

No branches or pull requests

2 participants