Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix S3 gateway cross-repo copies #7468

Merged
merged 12 commits into from
Feb 19, 2024
Merged

Conversation

arielshaqed
Copy link
Contributor

putobject and friends were confusing src and dst repositories, which will not do.

Fixes #7467.

putobject and friends were confusing src and dst repositories, which will
not do.

Fixes #7467.
@arielshaqed arielshaqed added bug Something isn't working area/gateway Changes to the gateway include-changelog PR description should be included in next release changelog labels Feb 15, 2024
@arielshaqed
Copy link
Contributor Author

Reviewers: Adding relevant test to Esti, but please review the change for now.

Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

10 passed

@arielshaqed arielshaqed requested a review from N-o-Z February 15, 2024 14:33
@arielshaqed
Copy link
Contributor Author

Reviewers: Adding relevant test to Esti, but please review the change for now.

Actually... testing this may not be possible with the current S3 gateway test, which uses the MinIO client, which I am not sure supports the multipart copies API. May take a while, will open a tech-debt issue if I cannot get this done here today.

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielshaqed Thanks, code is great no comments on the changes. However, can you please add a test scenario for that

@arielshaqed
Copy link
Contributor Author

@arielshaqed Thanks, code is great no comments on the changes. However, can you please add a test scenario for that

Thanks! There are some issues with testing, I'm still trying to figure out if I can use MinIO client ComposeObject to do this.

@N-o-Z
Copy link
Member

N-o-Z commented Feb 15, 2024

may not be pos

Reviewers: Adding relevant test to Esti, but please review the change for now.

Actually... testing this may not be possible with the current S3 gateway test, which uses the MinIO client, which I am not sure supports the multipart copies API. May take a while, will open a tech-debt issue if I cannot get this done here today.

@arielshaqed for that specific test you can use an S3 client and limit the test for s3 blockstore only, though I think minio does support multipart upload

@arielshaqed arielshaqed force-pushed the bug/7467-s3-gw-copy-from-boto3 branch from 37b3281 to 1dcdf87 Compare February 15, 2024 16:06
We might only have a problem with sigV2, which is considerably less
important by now.
This should make MPU copies that are signed using sigV2 work with the S3
gateway.
Otherwise infinite loop on upload >:-(
Exercise another code path, don't triple data size, and keep the originally
intended logic.
Repositories not always deleted, so if we don't do this we get a conflict.
@arielshaqed
Copy link
Contributor Author

@guy-har maybe you know? I managed to get Esti to pass, except for...

  1. SigV2. I am still missing some headers to sign. This is less urgent, I doubt we have many users of MPU copy with sigV2. So I am happy to open a tech debt issue and pull without it.
  2. Azure tests fail due to "InvalidBlockList". This would need to be an error in the Azure block adapter. The obvious suspect is the conversion from symmetrical bounds in the S3 API (bytes [start,end]) to asymmetrical bounds in the Azure BlobStore API (bytes [offset, offset+count)). But I had a look and we do add 1 to get the required bounds. Any ideas what else to try?

@arielshaqed arielshaqed force-pushed the bug/7467-s3-gw-copy-from-boto3 branch 3 times, most recently from df0ada8 to 2d1fdf6 Compare February 18, 2024 09:01
copyPartRange should stage a block on the destination.  This matters When
source and destination containers are different.
Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-o-Z I added an integration test to Esti. That was a great call because it uncovered an issue with the Azure block adapter! So I fixed that one too. Please re-re-review.

@guy-har I'd love to have your review of everything, especially of the changes to that Azure block adapter :-)

Please note that sigv2 still doesn't work. Opened #7472 to do that. But we should pull anyway: nobody should use sigV2, it's really hard to do nowadays, it solves the original problem for Boto3/Python users who are currently complaining, and lakeFS with this code is better than without it!

@arielshaqed arielshaqed requested a review from N-o-Z February 18, 2024 14:37
Copy link
Contributor

@guy-har guy-har left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Great!
Thanks!

@@ -328,6 +328,89 @@ func TestS3HeadBucket(t *testing.T) {
})
}

func TestS3CopyObjectMultipart(t *testing.T) {
const minDataContentLengthForMultipart = 5 << 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's a bit confusing that minDataContentLengthForMultipart and largeDataContentLength exist in different files and only used here

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test!!
Some optional suggestion

@@ -328,6 +328,88 @@ func TestS3HeadBucket(t *testing.T) {
})
}

func TestS3CopyObjectMultipart(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: 🎉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing tests are good for are finding bugs. And fixing them. And preventing regressions. Automatically. Not sure it's worth the effort.

Although... in this case this test actually found bugs. And helped fix one of them. And another test in the file prevented a regression. Automatically. Actually the case against writing tests has never been weaker 🥳 .

blobSizesURL := destinationContainer.NewBlockBlobClient(destinationObjName + sizeSuffix)
_, err = blobSizesURL.StageBlock(ctx, base64Etag, streaming.NopCloser(strings.NewReader(sizeData)), nil)
sizeData := fmt.Sprintf("%d\n", count)
blobSizesBlob := destinationContainer.NewBlockBlobClient(destinationKey.BlobURL + sizeSuffix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving sizeSuffix and idSuffix to this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done in this PR.

}

// RandomReader returns a reader that will return size bytes from rand.
func RandomReader(rand *rand.Rand, size int64) io.Reader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func RandomReader(rand *rand.Rand, size int64) io.Reader {
func NewRandomReader(rand *rand.Rand, size int64) io.Reader {

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, yes please!

Because it returns a new random reader, and also that matches the Go naming
conventions. :-)
Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Pulling once all tests pass.

blobSizesURL := destinationContainer.NewBlockBlobClient(destinationObjName + sizeSuffix)
_, err = blobSizesURL.StageBlock(ctx, base64Etag, streaming.NopCloser(strings.NewReader(sizeData)), nil)
sizeData := fmt.Sprintf("%d\n", count)
blobSizesBlob := destinationContainer.NewBlockBlobClient(destinationKey.BlobURL + sizeSuffix)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done in this PR.

@@ -328,6 +328,88 @@ func TestS3HeadBucket(t *testing.T) {
})
}

func TestS3CopyObjectMultipart(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing tests are good for are finding bugs. And fixing them. And preventing regressions. Automatically. Not sure it's worth the effort.

Although... in this case this test actually found bugs. And helped fix one of them. And another test in the file prevented a regression. Automatically. Actually the case against writing tests has never been weaker 🥳 .

}

// RandomReader returns a reader that will return size bytes from rand.
func RandomReader(rand *rand.Rand, size int64) io.Reader {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, yes please!

@arielshaqed arielshaqed enabled auto-merge (squash) February 19, 2024 08:22
@arielshaqed
Copy link
Contributor Author

license/cla stuck. But I work here, I literally signed on paper to grant the project my rights for work on lakeFS. Pulling by adminiforce.

@arielshaqed arielshaqed disabled auto-merge February 19, 2024 08:46
@arielshaqed arielshaqed merged commit 2cc79fa into master Feb 19, 2024
34 checks passed
@arielshaqed arielshaqed deleted the bug/7467-s3-gw-copy-from-boto3 branch February 19, 2024 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gateway Changes to the gateway bug Something isn't working include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support cross-repository copy and MPU copy in S3 gateway
3 participants