-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build StorageClient based on blob url #296
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier and @stevenwarejones)
a discussion (no related file):
Historically we've avoided having a single binary that depends on multiple cloud provider SDKs. Instead, we've had separate binaries for each provider. This has worked because the operator owns the storage.
Do we have a known use case for supporting multiple storage backends at runtime? e.g. buckets that are not owned by the operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier and @SanjayVas)
a discussion (no related file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
Historically we've avoided having a single binary that depends on multiple cloud provider SDKs. Instead, we've had separate binaries for each provider. This has worked because the operator owns the storage.
Do we have a known use case for supporting multiple storage backends at runtime? e.g. buckets that are not owned by the operator.
As discussed, I removed s3 support for now, but we can easily add it back in later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r3, all commit messages.
Reviewable status: 1 of 4 files reviewed, 4 unresolved discussions (waiting on @kungfucraig, @Marco-Premier, and @stevenwarejones)
a discussion (no related file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
As discussed, I removed s3 support for now, but we can easily add it back in later.
For completeness, there is indeed an expected use case of having buckets that are not owned by the operator. In particular, EDPs may own their own buckets for use with the EDP Aggregator.
src/main/kotlin/org/wfanet/measurement/storage/Selector.kt
line 32 at r3 (raw file):
) fun parseBlobUrl(url: String): BlobUrl? {
Don't define top-level non-extension functions. Wrap this in a class or object instead. We have some legacy code that does this, but new code should not.
Maybe StorageClientFactory
with a single public method? That would be clearer than "selector", as a factory is a known pattern.
src/main/kotlin/org/wfanet/measurement/storage/Selector.kt
line 34 at r3 (raw file):
fun parseBlobUrl(url: String): BlobUrl? { // Define regex for different blob URL patterns val s3Regex =
Use kotlin.text.Regex instead of java.util.regex.Pattern. Also define these as constants (you won't be able to use the const
keyword, but it's still a constant).
src/main/kotlin/org/wfanet/measurement/storage/Selector.kt
line 39 at r3 (raw file):
Pattern.compile("gs://(?<bucket>[^/]+)/(?<path>[^?]+)(?:\\?project=(?<project>[^&]+))?") val fileRegex = Pattern.compile("file://(?<path>.+)") // Match input URL with the regex
Rather than using regular expressions up-front, I think it'll be clearer to parse into a java.net.URI instance. Then you can just check the scheme
to know which StorageClient impl to use. After that, you can further parse out the information you need from the URI components.
src/main/kotlin/org/wfanet/measurement/storage/Selector.kt
line 78 at r3 (raw file):
} "file" -> { FileSystemStorageClient(directory = File("/"))
I don't think we ever want to allow writing/reading from the true root of the filesystem. Take in a fileSystemRoot: File
in the constructor instead.
Code quote:
File("/")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r1, 1 of 3 files at r3.
Reviewable status: 2 of 4 files reviewed, 6 unresolved discussions (waiting on @Marco-Premier, @SanjayVas, and @stevenwarejones)
src/main/kotlin/org/wfanet/measurement/storage/Selector.kt
line 32 at r3 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
Don't define top-level non-extension functions. Wrap this in a class or object instead. We have some legacy code that does this, but new code should not.
Maybe
StorageClientFactory
with a single public method? That would be clearer than "selector", as a factory is a known pattern.
Also add some java doc that describes what is supported, etc.
src/main/kotlin/org/wfanet/measurement/storage/Selector.kt
line 34 at r3 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
Use kotlin.text.Regex instead of java.util.regex.Pattern. Also define these as constants (you won't be able to use the
const
keyword, but it's still a constant).
Even though this function won't be called frequently I'd still consider moving them outside of this function so that you only have to pay to compile the regex once. (If indeed you stick with regexes.)
src/main/kotlin/org/wfanet/measurement/storage/Selector.kt
line 64 at r3 (raw file):
) } return null
Throw an exception if the path doesn't match anything that is supported, that is unless you put this in a class/object in which case you can make it private so that only the getStorageClient function can use it. Then the null would be fine.
src/main/kotlin/org/wfanet/measurement/storage/BUILD.bazel
line 40 at r3 (raw file):
kt_jvm_library( name = "selector", srcs = ["Selector.kt"],
Being a little more explicit about the name would be nice. Something like: StorageClientSelector.kt
No description provided.