From ee1722291573181233a07e2f2e7b0dfa9b7b5eb8 Mon Sep 17 00:00:00 2001 From: Barak Amar Date: Sun, 3 Dec 2023 09:34:22 +0200 Subject: [PATCH] docs: update AWS policy for s3 express support (#7080) --- docs/howto/deploy/aws.md | 63 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 59 insertions(+), 4 deletions(-) diff --git a/docs/howto/deploy/aws.md b/docs/howto/deploy/aws.md index 09a79763c79..2a1d17a56c8 100644 --- a/docs/howto/deploy/aws.md +++ b/docs/howto/deploy/aws.md @@ -174,15 +174,62 @@ Checkout Nginx [documentation](https://kubernetes.github.io/ingress-nginx/user-g ## Prepare your S3 bucket -1. From the S3 Administration console, choose _Create Bucket_. +1. Take note of the bucket name you want to use with lakeFS 2. Use the following as your bucket policy, filling in the placeholders: +
- + + ```json + { + "Id": "lakeFSPolicy", + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "lakeFSObjects", + "Action": [ + "s3:GetObject", + "s3:PutObject", + "s3:AbortMultipartUpload", + "s3:ListMultipartUploadParts" + ], + "Effect": "Allow", + "Resource": ["arn:aws:s3:::[BUCKET_NAME_AND_PREFIX]/*"], + "Principal": { + "AWS": ["arn:aws:iam::[ACCOUNT_ID]:role/[IAM_ROLE]"] + } + }, + { + "Sid": "lakeFSBucket", + "Action": [ + "s3:ListBucket", + "s3:GetBucketLocation", + "s3:ListBucketMultipartUploads" + ], + "Effect": "Allow", + "Resource": ["arn:aws:s3:::[BUCKET]"], + "Principal": { + "AWS": ["arn:aws:iam::[ACCOUNT_ID]:role/[IAM_ROLE]"] + } + } + ] + } + ``` + + * Replace `[BUCKET_NAME]`, `[ACCOUNT_ID]` and `[IAM_ROLE]` with values relevant to your environment. + * `[BUCKET_NAME_AND_PREFIX]` can be the bucket name. If you want to minimize the bucket policy permissions, use the bucket name together with a prefix (e.g. `example-bucket/a/b/c`). + This way, lakeFS will be able to create repositories only under this specific path (see: [Storage Namespace][understand-repository]). + * lakeFS will try to assume the role `[IAM_ROLE]`. +
+
+ + To use an S3 Express One Zone _directory bucket_, use the following policy. Note the `lakeFSDirectoryBucket` statement which is specifically required for using a directory bucket. + ```json { "Id": "lakeFSPolicy", @@ -214,6 +261,14 @@ Checkout Nginx [documentation](https://kubernetes.github.io/ingress-nginx/user-g "Principal": { "AWS": ["arn:aws:iam::[ACCOUNT_ID]:role/[IAM_ROLE]"] } + }, + { + "Sid": "lakeFSDirectoryBucket", + "Action": [ + "s3express:CreateSession" + ], + "Effect": "Allow", + "Resource": "arn:aws:s3express:[REGION]:[ACCOUNT_ID]:bucket/[BUCKET_NAME]" } ] } @@ -227,7 +282,7 @@ Checkout Nginx [documentation](https://kubernetes.github.io/ingress-nginx/user-g
If required lakeFS can operate without accessing the data itself, this permission section is useful if you are using [presigned URLs mode][presigned-url] or the [lakeFS Hadoop FileSystem Spark integration][integration-hadoopfs]. Since this FileSystem performs many operations directly on the storage, lakeFS requires less permissive permissions, resulting in increased security. - + lakeFS always requires permissions to access the `_lakefs` prefix under your storage namespace, in which metadata is stored ([learn more][understand-commits]). @@ -237,7 +292,7 @@ Checkout Nginx [documentation](https://kubernetes.github.io/ingress-nginx/user-g * Upload objects through Spark using the S3 gateway * Run `lakectl fs` commands (unless using **presign mode** with `--pre-sign` flag) * Use [Actions and Hooks](/howto/hooks/) - + ```json { "Id": "[POLICY_ID]",