Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushing large files error #99

Open
ermolaev94 opened this issue Sep 6, 2024 · 4 comments
Open

Pushing large files error #99

ermolaev94 opened this issue Sep 6, 2024 · 4 comments
Labels
awaiting response bug Something isn't working

Comments

@ermolaev94
Copy link

Overview

Pushing large files under S3 bucker leads to the following error:

Argument partNumber must be an integer between 1 and 10000.: An error occurred (InvalidArgument) when calling the UploadPart operation: Argument partNumber must be an integer between 1 and 10000.

I've tried to fix situation by settin chunk size according to the AWS documentation:

# ~/.aws/config
[default]
s3 =
    multipart_chunksize = 512

It does not help. I've tried to debug dvc-s3 and cheked that argument is read, but it's not clear how it is used. I've noticed that "s3" config stayed empty, while "self._transfer_config" has updated.

Problem starting from 800Gb file size.

@shcheklein
Copy link
Member

@ermolaev94 is it S3-compatible storage? (yandex cloud or something)? Just curious if it is something specific about them ....

@shcheklein shcheklein added bug Something isn't working awaiting response labels Sep 7, 2024
@dberenbaum
Copy link
Contributor

According to the aws docs, it looks like multipart_chunksize takes either the size in bytes or else requires a size suffix, so could it be as simple as needing to set multipart_chunksize = 512MB?

@ermolaev94
Copy link
Author

@ermolaev94 is it S3-compatible storage? (yandex cloud or something)? Just curious if it is something specific about them ....

It's yandex-s3, single file limit is 5Tb.

According to the aws docs, it looks like multipart_chunksize takes either the size in bytes or else requires a size suffix, so could it be as simple as needing to set multipart_chunksize = 512MB?

Hm, thx, run this command. I will return with the update ASAP.

@ermolaev94
Copy link
Author

ermolaev94 commented Sep 26, 2024

According to the aws docs, it looks like multipart_chunksize takes either the size in bytes or else requires a size suffix, so could it be as simple as needing to set multipart_chunksize = 512MB?

I've tried your suggestion and error is still the same.

My config file for AWS is the following:

[default]
region = ru-central1
s3 =
    multipart_chunksize = 512MB

I've generated huge file with the following command:

$ dd if=/dev/urandom of=large_file.bin bs=1M count=1228800

File is ~1.1Tb, count of chunks with the single chunk size = 512Mb should be approximately <2300.

Then I've run dvc add & push:

$ dvc add large_file.bin
$ dvc push large_file.bin.dvc
...
Argument partNumber must be an integer between 1 and 10000.: An error occurred (InvalidArgument) when calling the UploadPart operation: Argument partNumber must be an integer between 1 and 10000.

and have got the same issue.

Then I've tried to push via AWS-CLI:

$ aws --endpoint-url=https://storage.yandexcloud.net/ s3 cp large_file.bin s3://<bucket-name>/large_file.bin

and it works fine

image

I suppose that aws cp works not in the same way as dvc push does, but I didn't find exact command in dvc-s3 package to repeat. Anyway, it looks like there is a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants