-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added implementation for Google Cloud Storage #1558
base: main
Are you sure you want to change the base?
Conversation
e465ce3
to
a0dc68b
Compare
Similar to #1547 Dear reviewers, There are many ways to do, I have connected with Community and finally approached with |
a0dc68b
to
d13eb35
Compare
Some of the tests are failing due to a flaky issue. I will fix that before finalizing the PR |
d13eb35
to
0b456d3
Compare
a178189
to
70f93cf
Compare
f012e75
to
6b12feb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good! Main points are the while let
loops which seem inefficient/unnecessarily blocking and use of the Retry
in the config.
Reviewed 2 of 8 files at r1, 13 of 13 files at r5, all commit messages.
Reviewable status: 0 of 2 LGTMs obtained, and all files reviewed, and pending CI: Cargo Dev / macos-13, docker-compose-compiles-nativelink (22.04), pre-commit-checks, and 9 discussions need to be resolved
-- commits
line 2 at r5:
nit: "Add implementation ..." (imperative mood)
nativelink-store/src/gcs_client/client.rs
line 140 at r5 (raw file):
let token = self.auth.get_valid_token().await?; metadata.insert( "authorization",
nit: I believe this should be capitalized by convention.
nativelink-store/src/gcs_client/client.rs
line 150 at r5 (raw file):
metadata.insert( "x-goog-request-params",
nit: Might be a good idea to put a comment like "See: https://xxxxx" so that future readers can see where this value is coming from.
nativelink-store/src/gcs_client/client.rs
line 203 at r5 (raw file):
let mut client_guard = client.write().await; let mut buffer = Vec::with_capacity(size as usize); while let Ok(Some(chunk)) = rx.try_next().await {
This should leverage a bytestream, similar to
nativelink/nativelink-store/src/s3_store.rs
Line 448 in 1ee61b1
.body(ByteStream::from_body_1_x(BodyWrapper { |
nativelink-store/src/gcs_client/client.rs
line 317 at r5 (raw file):
let mut offset = 0; while offset < size {
Instead of dynamically updating the remaining chunks etc this should probably precompute the sizes and run a map over futures similar to
nativelink/nativelink-store/src/s3_store.rs
Line 530 in 1ee61b1
let upload_parts = move || async move { |
nativelink-config/src/stores.rs
line 81 at r5 (raw file):
experimental_s3_store(S3Spec), /// GCS store will use Google's GCS service as a backend to store
nit: "GCS store uses ..."
nativelink-store/tests/gcs_store_test.rs
line 1 at r5 (raw file):
// Copyright 2024 The NativeLink Authors. All rights reserved.
nit: Could you splitt of the tests between the different files? I.e. one test file for each source file. This way it's clearer which tests belong to what code.
nativelink-store/src/gcs_client/auth.rs
line 12 at r5 (raw file):
const SCOPE: &str = "https://www.googleapis.com/auth/cloud-platform"; const AUDIENCE: &str = "https://storage.googleapis.com/"; const TOKEN_LIFETIME: Duration = Duration::from_secs(3600); // 1 hour
All of these hardcoded values should be configurable.
For the retry-related configs, we might be able to use a RetrySpec:
nativelink/nativelink-config/src/stores.rs
Line 1031 in 1ee61b1
pub struct Retry { |
nativelink-store/src/gcs_client/auth.rs
line 97 at r5 (raw file):
fn calculate_retry_delay(attempt: u32) -> Duration { let base = RETRY_DELAY_BASE; let max_delay = Duration::from_secs(30); // Cap maximum delay
This should be configurable
Description
This PR implements the Google Cloud Store implementation and closely aligns with the AWS S3 implementation.
This utilizes the google protos. There were three primary options that I considered here (gcloud_sdk_rs, googleapis-tonic-google-storage-control-v2, or building it locally). Out of all these three options,
googleapis-tonic-google-storage-control-v2
seemed to have the easiest approach to just use the rust bindings, without importing a number of libraries like reqwest, etc.Following the same approach I used for Azure Blob Storage, I started my development by first creating a POC: https://gist.github.com/amankrx/d3861756e00e0d5157cd4230477b4024
This POC tests the operations that we will be performing in the real implementation.
It also has auth set up properly including auto-refresh of the tokens. You should ensure that you have
GCS_PRIVATE_KEY
set up in the environment variable.Fixes #659
/claim #659
Type of change
Please delete options that aren't relevant.
How Has This Been Tested?
I did an e2e testing using the gcs_backend.json5 CAS I have provided. Along with that, I have set up a mocking framework for the GCS client and added a few tests for both the GCS store and the GCS client. I will add a few more tests before merging the PR.
Checklist
bazel test //...
passes locallygit amend
see some docsThis change is