Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for COPY TO/FROM Azure Blob Storage #55

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

aykut-bozkurt
Copy link
Collaborator

@aykut-bozkurt aykut-bozkurt commented Oct 23, 2024

Supports following Azure Blob uri forms:

  • az://{container}/key
  • azure://{container}/key
  • https://{account}.blob.core.windows.net/{container}

Configuration

The simplest way to configure object storage is by creating the standard ~/.azure/config file:

$ cat ~/.azure/config
[storage]
account = devstoreaccount1
key = Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==

Alternatively, you can use the following environment variables when starting postgres to configure the Azure Blob Storage client:

  • AZURE_STORAGE_ACCOUNT: the storage account name of the Azure Blob
  • AZURE_STORAGE_KEY: the storage key of the Azure Blob
  • AZURE_STORAGE_SAS_TOKEN: the storage SAS token for the Azure Blob
  • AZURE_CONFIG_FILE: an alternative location for the config file

Bonus
Additionally, PR supports following S3 uri forms:

  • s3://{bucket}/key
  • s3a://{bucket}/key
  • https://s3.amazonaws.com/{bucket}/key
  • https://{bucket}.s3.amazonaws.com/key

Closes #50

@aykut-bozkurt aykut-bozkurt marked this pull request as draft October 23, 2024 17:53
@aykut-bozkurt aykut-bozkurt marked this pull request as ready for review October 23, 2024 17:55
@aykut-bozkurt aykut-bozkurt marked this pull request as draft October 23, 2024 17:55
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/azure-blob-storage branch from fe03728 to 4dc228c Compare October 23, 2024 17:56
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/azure-blob-storage branch 3 times, most recently from b9114cf to 2feb683 Compare October 26, 2024 00:47
@aykut-bozkurt aykut-bozkurt marked this pull request as ready for review October 26, 2024 00:48
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/azure-blob-storage branch 3 times, most recently from afb3c71 to 2a3061f Compare November 9, 2024 23:27
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/azure-blob-storage branch from 2a3061f to 0a3281f Compare November 28, 2024 15:38
Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 98.64865% with 5 lines in your changes missing coverage. Please review.

Project coverage is 92.35%. Comparing base (2c1a62d) to head (841f5ec).

Files with missing lines Patch % Lines
src/arrow_parquet/uri_utils.rs 95.90% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #55      +/-   ##
==========================================
+ Coverage   92.11%   92.35%   +0.23%     
==========================================
  Files          71       71              
  Lines        9109     9434     +325     
==========================================
+ Hits         8391     8713     +322     
- Misses        718      721       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aykut-bozkurt aykut-bozkurt force-pushed the aykut/azure-blob-storage branch from 0a3281f to 80e449f Compare November 28, 2024 21:41
}

// ~/.azure/config
let azure_config_file_path = std::env::var("AZURE_CONFIG_FILE").unwrap_or(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use from_env()?

it's a bit surprising that we're inflicting the environment variables on ourselves

Copy link
Collaborator Author

@aykut-bozkurt aykut-bozkurt Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some of the tests, we need to make sure only a subset of the all env vars should exist to verify that different auth methods or scenarios work.

Supports following Azure Blob uri forms:
- `az://{container}/key`
- `azure://{container}/key`
- `https://{account}.blob.core.windows.net/{container}`

**Configuration**

The simplest way to configure object storage is by creating the standard [`~/.azure/config`](https://learn.microsoft.com/en-us/cli/azure/azure-cli-configuration?view=azure-cli-latest) file:

```bash
$ cat ~/.azure/config
[storage]
account = devstoreaccount1
key = Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==
```

Alternatively, you can use the following environment variables when starting postgres to configure the Azure Blob Storage client:
- `AZURE_STORAGE_ACCOUNT`: the storage account name of the Azure Blob
- `AZURE_STORAGE_KEY`: the storage key of the Azure Blob
- `AZURE_STORAGE_SAS_TOKEN`: the storage SAS token for the Azure Blob
- `AZURE_CONFIG_FILE`: an alternative location for the config file

**Bonus**
Additionally, PR supports following S3 uri forms:
- `s3://{bucket}/key`
- `s3a://{bucket}/key`
- `https://s3.amazonaws.com/{bucket}/key`
- `https://{bucket}.s3.amazonaws.com/key`

Closes #50
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/azure-blob-storage branch from 80e449f to 8553677 Compare December 3, 2024 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What would be needed to use Azure Blob Storage?
2 participants