Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build StorageClient based on blob url #296

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

stevenwarejones
Copy link
Contributor

No description provided.

@wfa-reviewable
Copy link

This change is Reviewable

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier and @stevenwarejones)


a discussion (no related file):
Historically we've avoided having a single binary that depends on multiple cloud provider SDKs. Instead, we've had separate binaries for each provider. This has worked because the operator owns the storage.

Do we have a known use case for supporting multiple storage backends at runtime? e.g. buckets that are not owned by the operator.

Copy link
Contributor Author

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier and @SanjayVas)


a discussion (no related file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Historically we've avoided having a single binary that depends on multiple cloud provider SDKs. Instead, we've had separate binaries for each provider. This has worked because the operator owns the storage.

Do we have a known use case for supporting multiple storage backends at runtime? e.g. buckets that are not owned by the operator.

As discussed, I removed s3 support for now, but we can easily add it back in later.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r3, all commit messages.
Reviewable status: 1 of 4 files reviewed, 4 unresolved discussions (waiting on @kungfucraig, @Marco-Premier, and @stevenwarejones)


a discussion (no related file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

As discussed, I removed s3 support for now, but we can easily add it back in later.

For completeness, there is indeed an expected use case of having buckets that are not owned by the operator. In particular, EDPs may own their own buckets for use with the EDP Aggregator.


src/main/kotlin/org/wfanet/measurement/storage/Selector.kt line 32 at r3 (raw file):

)

fun parseBlobUrl(url: String): BlobUrl? {

Don't define top-level non-extension functions. Wrap this in a class or object instead. We have some legacy code that does this, but new code should not.

See https://github.com/world-federation-of-advertisers/cross-media-measurement/blob/main/docs/code-style.md#kotlin

Maybe StorageClientFactory with a single public method? That would be clearer than "selector", as a factory is a known pattern.


src/main/kotlin/org/wfanet/measurement/storage/Selector.kt line 34 at r3 (raw file):

fun parseBlobUrl(url: String): BlobUrl? {
  // Define regex for different blob URL patterns
  val s3Regex =

Use kotlin.text.Regex instead of java.util.regex.Pattern. Also define these as constants (you won't be able to use the const keyword, but it's still a constant).


src/main/kotlin/org/wfanet/measurement/storage/Selector.kt line 39 at r3 (raw file):

    Pattern.compile("gs://(?<bucket>[^/]+)/(?<path>[^?]+)(?:\\?project=(?<project>[^&]+))?")
  val fileRegex = Pattern.compile("file://(?<path>.+)")
  // Match input URL with the regex

Rather than using regular expressions up-front, I think it'll be clearer to parse into a java.net.URI instance. Then you can just check the scheme to know which StorageClient impl to use. After that, you can further parse out the information you need from the URI components.


src/main/kotlin/org/wfanet/measurement/storage/Selector.kt line 78 at r3 (raw file):

      }
      "file" -> {
        FileSystemStorageClient(directory = File("/"))

I don't think we ever want to allow writing/reading from the true root of the filesystem. Take in a fileSystemRoot: File in the constructor instead.

Code quote:

File("/")

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r1, 1 of 3 files at r3.
Reviewable status: 2 of 4 files reviewed, 6 unresolved discussions (waiting on @Marco-Premier, @SanjayVas, and @stevenwarejones)


src/main/kotlin/org/wfanet/measurement/storage/Selector.kt line 32 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Don't define top-level non-extension functions. Wrap this in a class or object instead. We have some legacy code that does this, but new code should not.

See https://github.com/world-federation-of-advertisers/cross-media-measurement/blob/main/docs/code-style.md#kotlin

Maybe StorageClientFactory with a single public method? That would be clearer than "selector", as a factory is a known pattern.

Also add some java doc that describes what is supported, etc.


src/main/kotlin/org/wfanet/measurement/storage/Selector.kt line 34 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Use kotlin.text.Regex instead of java.util.regex.Pattern. Also define these as constants (you won't be able to use the const keyword, but it's still a constant).

Even though this function won't be called frequently I'd still consider moving them outside of this function so that you only have to pay to compile the regex once. (If indeed you stick with regexes.)


src/main/kotlin/org/wfanet/measurement/storage/Selector.kt line 64 at r3 (raw file):

    )
  }
  return null

Throw an exception if the path doesn't match anything that is supported, that is unless you put this in a class/object in which case you can make it private so that only the getStorageClient function can use it. Then the null would be fine.


src/main/kotlin/org/wfanet/measurement/storage/BUILD.bazel line 40 at r3 (raw file):

kt_jvm_library(
    name = "selector",
    srcs = ["Selector.kt"],

Being a little more explicit about the name would be nice. Something like: StorageClientSelector.kt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants