-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support custom AWS S3 Endpoint #895
base: master
Are you sure you want to change the base?
Conversation
Can you add some background here? What's the use case? Why is this valuable to you? |
This allows using S3-compatible services like Tigris or self-host the storage. |
Hi all, |
It seems Minio isn't following the standard convention. It works with Tigris |
Using the flag |
@lukas-runge Great, does this cover your usecase? |
Yes - just tested it. 👍 Ty @aminya! 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, I appreciate the PR, and the discussion from everybody here.
As-is this PR works, but the implementation is not ideal, because it doesn't take into account existing users wanting to migrate from one storage provider to another, e.g. from S3 to Tigris. What I mean is that if an existing Keygen CE user with artifacts already stored in S3 wanted to move from S3 to Tigris, they couldn't do so safely without replicating their entire S3 bucket into Tigris, which can be very costly for large objects. Without the replication, existing artifacts would no longer be accessible, since they would exist in S3 but not Tigris, and the S3 backend now points to Tigris and not S3 (which is confusing unto itself when there is an R2 backend that is also S3-compatible). This also doesn't support scenarios where some accounts want to store artifacts in S3 while others may be ok with Tigris — with this PR, S3 points to Tigris, breaking accounts that want S3.
In the case of Keygen Cloud, when we moved from S3 to R2, we did so seamlessly, without replicating TBs of data from S3 to R2. We were able to do this because existing artifacts remained in S3, while new artifacts were stored in R2. And this is still the case to this day. Some accounts are still on S3, per their internal requirements, while most are on R2. I want the same story for Keygen CE/EE users moving from S3 or R2 to Tigris or Minio.
So with that said —
Rather than introduce customization/overloads for the S3 storage backend, e.g. endpoint
and force_path_style
(a config that the user has to understand), I'd rather we introduce dedicated Tigris and Minio backends. The Minio client can set force_path_style: true
, so that it works out of the box without the user needing to know about that config.
Both new backends could be configured via TIGRIS_
and MINIO_
env vars, just like AWS_
and CF_
are used to configure S3 and R2, respectively. The code is already designed in a way to be extensible, to be able to add TIGRIS
and MINIO
, and even more storage providers in the future. Each backend would have its own specific environment variables for configuring the client so that it "Just Works."
In addition to the new Tigris and Minio backends, we should add a KEYGEN_STORAGE_BACKEND
environment variable to set the default storage backend for new accounts, since there would be more than just the current S3 and R2 backends.
Migrating an existing account would be as simple as running:
account = Account.sole
account.update!(backend: 'TIGRIS')
Existing artifacts, e.g. in R2, would still be accessible, since artifacts have their own backend
, and new artifacts would go to Tigris. No replication needed!
Would you be open to adjusting the PR?
The idea of having separate classes for each provider sounds good at the first glance, however it doesn't match the original promise of these services. As you see, Tigris simply needs providing the endpoint, and it uses the same AWS environment variables. When you set up a Fly.io Tigris machine, the exact environment variables I used in this code are set automatically by Fly. So if we change these names, it's gonna make the deployment harder not easier. https://www.tigrisdata.com/docs/sdks/s3/aws-ruby-sdk/ Another thing is that we can't simply find all the S3 providers and provide wrappers for them. For example, it's my first time hearing about Minio. |
This comment was marked as outdated.
This comment was marked as outdated.
Quick follow up. I changed my mind. After looking at what other open source projects do, I'm fine with introducing AWS_ACCESS_KEY_ID = ENV['AWS_ACCESS_KEY_ID']
AWS_SECRET_ACCESS_KEY = ENV['AWS_SECRET_ACCESS_KEY']
AWS_BUCKET = ENV['AWS_BUCKET']
AWS_REGION = ENV['AWS_REGION']
AWS_ENDPOINT = ENV['AWS_ENDPOINT'] || ENV['AWS_ENDPOINT_URL_S3']
AWS_FORCE_PATH_STYLE = ENV['AWS_FORCE_PATH_STYLE'].in?(%w[true t 1])
AWS_CLIENT_OPTIONS = {
access_key_id: AWS_ACCESS_KEY_ID,
secret_access_key: AWS_SECRET_ACCESS_KEY,
region: AWS_REGION,
endpoint: AWS_ENDPOINT,
force_path_style: AWS_FORCE_PATH_STYLE,
}.compact_blank
.freeze These seem to be the 2 most commonly used environment variables for configuring S3-compatible object stores. For users that want proper support for a Tigris/Minio/etc. backend, they can implement support in a separate PR. Thanks for the discussion here, and for your patience. Would you be open to updating the PR, @aminya? |
f9d387c
to
bcd979f
Compare
@ezekg Changed the variable names accordingly. |
This PR allows setting the AWS S3 endpoint URL. This allows for using S3-compatible services like Tigris or self-hosting the storage.
https://www.tigrisdata.com/docs/sdks/s3/aws-ruby-sdk/