Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for access-token secrets #64

Merged
merged 7 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/workflows/CloudTesting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ jobs:
VCPKG_TOOLCHAIN_PATH: ${{ github.workspace }}/vcpkg/scripts/buildsystems/vcpkg.cmake
GEN: Ninja
DUCKDB_PLATFORM: linux_amd64
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true

steps:
- name: Install required ubuntu packages
Expand Down Expand Up @@ -64,6 +65,14 @@ jobs:
az login --service-principal -u ${{secrets.AZURE_CLIENT_ID}} -p ${{secrets.AZURE_CLIENT_SECRET}} --tenant ${{secrets.AZURE_TENANT_ID}}
python3 duckdb/scripts/run_tests_one_by_one.py ./build/release/test/unittest "*test/sql/cloud/*"

- name: Test with access token
env:
AZURE_STORAGE_ACCOUNT: ${{secrets.AZURE_STORAGE_ACCOUNT}}
run: |
az login --service-principal -u ${{secrets.AZURE_CLIENT_ID}} -p ${{secrets.AZURE_CLIENT_SECRET}} --tenant ${{secrets.AZURE_TENANT_ID}}
export AZURE_ACCESS_TOKEN=`az account get-access-token --resource https://storage.azure.com --query accessToken --output tsv`
python3 duckdb/scripts/run_tests_one_by_one.py ./build/release/test/unittest "*test/sql/cloud/*"

- name: Log out azure-cli
if: always()
run: |
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/LocalTesting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ jobs:
AZURE_STORAGE_CONNECTION_STRING: 'DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1;QueueEndpoint=http://127.0.0.1:10001/devstoreaccount1;TableEndpoint=http://127.0.0.1:10002/devstoreaccount1;'
AZURE_STORAGE_ACCOUNT: devstoreaccount1
HTTP_PROXY_RUNNING: '1'
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true

steps:
- uses: actions/checkout@v3
Expand Down
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
src/azure_extension.cpp
src/azure_secret.cpp
src/azure_filesystem.cpp
src/azure_http_state.cpp
src/azure_storage_account_client.cpp
src/azure_blob_filesystem.cpp
src/azure_dfs_filesystem.cpp
Expand Down
61 changes: 53 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
# DuckDB Azure Extension

This extension adds a filesystem abstraction for Azure blob storage to DuckDB. To use it, install latest DuckDB. The extension currently supports only **reads** and **globs**.

The easiest way to get started is by using a connection string to create a DuckDB secret:
Setup authentication (leverages either Azure CLI or Managed Identity):
```sql
CREATE SECRET (
CREATE SECRET secret1 (
TYPE AZURE,
CONNECTION_STRING '<value>'
PROVIDER CREDENTIAL_CHAIN,
ACCOUNT_NAME '⟨storage account name⟩'
);
```
Alternatively, you can let the azure extension automatically fetch your azure credentials, check out the [docs](https://duckdb.org/docs/extensions/azure#credential_chain-provider) on how to do that.

Then to query a file on azure:
```sql
Expand All @@ -20,7 +21,45 @@ Globbing is also supported:
SELECT count(*) FROM 'azure://dummy_container/*.csv';
```

Other authentication options available:
- Connection string
```sql
CREATE SECRET secret2 (
TYPE AZURE,
CONNECTION_STRING '<value>'
);
```
- Service Principal (replace `CLIENT_SECRET` with `CLIENT_CERTIFICATE_PATH` to use a client certificate)
```sql
CREATE SECRET azure3 (
TYPE AZURE,
PROVIDER SERVICE_PRINCIPAL,
TENANT_ID '⟨tenant id⟩',
CLIENT_ID '⟨client id⟩',
CLIENT_SECRET '⟨client secret⟩',
ACCOUNT_NAME '⟨storage account name⟩'
);
```
- Access token (its audience needs to be `https://storage.azure.com`)
```sql
CREATE SECRET secret4 (
TYPE AZURE,
PROVIDER ACCESS_TOKEN,
ACCESS_TOKEN '<value>'
ACCOUNT_NAME '⟨storage account name⟩'
);
```
- Anonymous
```sql
CREATE SECRET secret5 (
TYPE AZURE,
PROVIDER CONFIG,
ACCOUNT_NAME '⟨storage account name⟩'
);
```

## Supported architectures

The extension is tested & distributed for Linux (x64, arm64), MacOS (x64, arm64) and Windows (x64)

## Documentation
Expand All @@ -30,10 +69,16 @@ See the [Azure page in the DuckDB documentation](https://duckdb.org/docs/extensi
Check out the tests in `test/sql` for more examples.

## Building
This extension depends on the Azure c++ sdk. To build it, either install that manually, or use vcpkg
to do dependency management. To install vcpkg check out the docs [here](https://vcpkg.io/en/getting-started.html).
Then to build this extension run:

For development, this extension requires [CMake](https://cmake.org), Python3, a `C++11` compliant compiler, and the Azure C++ SDK. Run `make` in the root directory to compile the sources. Run `make debug` to build a non-optimized debug version. Run `make test` to verify that your version works properly after making changes. Install the Azure C++ SDK using [vcpkg](https://vcpkg.io/en/getting-started.html) and set the `VCPKG_TOOLCHAIN_PATH` environment variable when building.

```shell
VCPKG_TOOLCHAIN_PATH=<path_to_your_vcpkg_toolchain> make
sudo apt-get update && sudo apt-get install -y git g++ cmake ninja-build libssl-dev
git clone --recursive https://github.com/duckdb/duckdb_azure
git clone https://github.com/microsoft/vcpkg
./vcpkg/bootstrap-vcpkg.sh
cd duckdb_azure
GEN=ninja VCPKG_TOOLCHAIN_PATH=$PWD/../vcpkg/scripts/buildsystems/vcpkg.cmake make
```

Please also refer to our [Build Guide](https://duckdb.org/dev/building) and [Contribution Guide]([CONTRIBUTING.md](https://github.com/duckdb/duckdb/blob/main/CONTRIBUTING.md)).
2 changes: 1 addition & 1 deletion duckdb
Submodule duckdb updated 2482 files
2 changes: 1 addition & 1 deletion src/azure_blob_filesystem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#include "duckdb/common/exception.hpp"
#include "duckdb/common/helper.hpp"
#include "duckdb/common/shared_ptr.hpp"
#include "duckdb/common/http_state.hpp"
#include "azure_http_state.hpp"
#include "duckdb/common/file_opener.hpp"
#include "duckdb/common/string_util.hpp"
#include "duckdb/main/secret/secret.hpp"
Expand Down
59 changes: 59 additions & 0 deletions src/azure_http_state.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#include "azure_http_state.hpp"
#include "duckdb/main/client_context.hpp"
#include "duckdb/main/query_profiler.hpp"

namespace duckdb {

void AzureHTTPState::Reset() {
head_count = 0;
get_count = 0;
put_count = 0;
post_count = 0;
total_bytes_received = 0;
total_bytes_sent = 0;
}

shared_ptr<AzureHTTPState> AzureHTTPState::TryGetState(ClientContext &context) {
auto lookup = context.registered_state.find("azure_http_state");

if (lookup != context.registered_state.end()) {
return shared_ptr_cast<ClientContextState, AzureHTTPState>(lookup->second);
}

auto http_state = make_shared_ptr<AzureHTTPState>();
context.registered_state["azure_http_state"] = http_state;
return http_state;
}

shared_ptr<AzureHTTPState> AzureHTTPState::TryGetState(optional_ptr<FileOpener> opener) {
auto client_context = FileOpener::TryGetClientContext(opener);
if (client_context) {
return TryGetState(*client_context);
}
return nullptr;
}

void AzureHTTPState::WriteProfilingInformation(std::ostream &ss) {
string read = "in: " + StringUtil::BytesToHumanReadableString(total_bytes_received);
string written = "out: " + StringUtil::BytesToHumanReadableString(total_bytes_sent);
string head = "#HEAD: " + to_string(head_count);
string get = "#GET: " + to_string(get_count);
string put = "#PUT: " + to_string(put_count);
string post = "#POST: " + to_string(post_count);

constexpr idx_t TOTAL_BOX_WIDTH = 39;
ss << "┌─────────────────────────────────────┐\n";
ss << "│┌───────────────────────────────────┐│\n";
ss << "││" + QueryProfiler::DrawPadded("Azure HTTP Stats", TOTAL_BOX_WIDTH - 4) + "││\n";
ss << "││ ││\n";
ss << "││" + QueryProfiler::DrawPadded(read, TOTAL_BOX_WIDTH - 4) + "││\n";
ss << "││" + QueryProfiler::DrawPadded(written, TOTAL_BOX_WIDTH - 4) + "││\n";
ss << "││" + QueryProfiler::DrawPadded(head, TOTAL_BOX_WIDTH - 4) + "││\n";
ss << "││" + QueryProfiler::DrawPadded(get, TOTAL_BOX_WIDTH - 4) + "││\n";
ss << "││" + QueryProfiler::DrawPadded(put, TOTAL_BOX_WIDTH - 4) + "││\n";
ss << "││" + QueryProfiler::DrawPadded(post, TOTAL_BOX_WIDTH - 4) + "││\n";
ss << "│└───────────────────────────────────┘│\n";
ss << "└─────────────────────────────────────┘\n";
}

} // namespace duckdb
32 changes: 32 additions & 0 deletions src/azure_secret.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,31 @@ static unique_ptr<BaseSecret> CreateAzureSecretFromServicePrincipal(ClientContex
return std::move(result);
}

static unique_ptr<BaseSecret> CreateAzureSecretFromAccessToken(ClientContext &context, CreateSecretInput &input) {
auto scope = input.scope;
if (scope.empty()) {
scope.push_back("azure://");
scope.push_back("az://");
scope.push_back(AzureDfsStorageFileSystem::PATH_PREFIX);
}

auto result = make_uniq<KeyValueSecret>(scope, input.type, input.provider, input.name);

// Manage common option that all secret type share
for (const auto *key : COMMON_OPTIONS) {
CopySecret(key, input, *result);
}

// Manage specific secret option
CopySecret("access_token", input, *result);

// Redact sensible keys
RedactCommonKeys(*result);
result->redact_keys.insert("access_token");

return std::move(result);
}

static void RegisterCommonSecretParameters(CreateSecretFunction &function) {
// Register azure common parameters
function.named_parameters["account_name"] = LogicalType::VARCHAR;
Expand Down Expand Up @@ -141,6 +166,7 @@ void CreateAzureSecretFunctions::Register(DatabaseInstance &instance) {
RegisterCommonSecretParameters(cred_chain_function);
ExtensionUtil::RegisterFunction(instance, cred_chain_function);

// Register the service_principal secret provider
CreateSecretFunction service_principal_function = {type, "service_principal",
CreateAzureSecretFromServicePrincipal};
service_principal_function.named_parameters["tenant_id"] = LogicalType::VARCHAR;
Expand All @@ -149,6 +175,12 @@ void CreateAzureSecretFunctions::Register(DatabaseInstance &instance) {
service_principal_function.named_parameters["client_certificate_path"] = LogicalType::VARCHAR;
RegisterCommonSecretParameters(service_principal_function);
ExtensionUtil::RegisterFunction(instance, service_principal_function);

// Register the access_token secret provider
CreateSecretFunction access_token_function = {type, "access_token", CreateAzureSecretFromAccessToken};
access_token_function.named_parameters["access_token"] = LogicalType::VARCHAR;
RegisterCommonSecretParameters(access_token_function);
ExtensionUtil::RegisterFunction(instance, access_token_function);
}

} // namespace duckdb
69 changes: 63 additions & 6 deletions src/azure_storage_account_client.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ static std::string AccountUrl(const AzureParsedUrl &azure_parsed_url) {

template <typename T>
static T ToClientOptions(const Azure::Core::Http::Policies::TransportOptions &transport_options,
shared_ptr<HTTPState> http_state) {
shared_ptr<AzureHTTPState> http_state) {
static_assert(std::is_base_of<Azure::Core::_internal::ClientOptions, T>::value,
"type parameter must be an Azure ClientOptions");
T options;
Expand All @@ -94,13 +94,13 @@ static T ToClientOptions(const Azure::Core::Http::Policies::TransportOptions &tr

static Azure::Storage::Blobs::BlobClientOptions
ToBlobClientOptions(const Azure::Core::Http::Policies::TransportOptions &transport_options,
shared_ptr<HTTPState> http_state) {
shared_ptr<AzureHTTPState> http_state) {
return ToClientOptions<Azure::Storage::Blobs::BlobClientOptions>(transport_options, std::move(http_state));
}

static Azure::Storage::Files::DataLake::DataLakeClientOptions
ToDfsClientOptions(const Azure::Core::Http::Policies::TransportOptions &transport_options,
shared_ptr<HTTPState> http_state) {
shared_ptr<AzureHTTPState> http_state) {
return ToClientOptions<Azure::Storage::Files::DataLake::DataLakeClientOptions>(transport_options,
std::move(http_state));
}
Expand All @@ -112,16 +112,16 @@ ToTokenCredentialOptions(const Azure::Core::Http::Policies::TransportOptions &tr
return options;
}

static shared_ptr<HTTPState> GetHttpState(optional_ptr<FileOpener> opener) {
static shared_ptr<AzureHTTPState> GetHttpState(optional_ptr<FileOpener> opener) {
Value value;
bool enable_http_stats = false;
if (FileOpener::TryGetCurrentSetting(opener, "azure_http_stats", value)) {
enable_http_stats = value.GetValue<bool>();
}

shared_ptr<HTTPState> http_state;
shared_ptr<AzureHTTPState> http_state;
if (enable_http_stats) {
http_state = HTTPState::TryGetState(opener);
http_state = AzureHTTPState::TryGetState(opener);
}

return http_state;
Expand Down Expand Up @@ -197,6 +197,33 @@ CreateClientCredential(const KeyValueSecret &secret,
transport_options);
}

class AccessTokenCredential : public Azure::Core::Credentials::TokenCredential {
public:
AccessTokenCredential(const std::string& token) : Azure::Core::Credentials::TokenCredential("AccessTokenCredential") {
access_token.Token = token;
access_token.ExpiresOn = Azure::DateTime::max(); // Refreshing tokens is not supported, so setting expiry time to infinity
}

Azure::Core::Credentials::AccessToken GetToken(
Azure::Core::Credentials::TokenRequestContext const& tokenRequestContext,
Azure::Core::Context const& context) const override {
return access_token;
};

private:
Azure::Core::Credentials::AccessToken access_token;
};

static std::shared_ptr<Azure::Core::Credentials::TokenCredential>
CreateAccessTokenCredential(const KeyValueSecret &secret) {
constexpr bool error_on_missing = true;
auto access_token_val = secret.TryGetValue("access_token", error_on_missing);

std::string access_token = access_token_val.IsNull() ? "" : access_token_val.ToString();

return std::make_shared<AccessTokenCredential>(access_token);
}

static std::shared_ptr<Azure::Core::Http::HttpTransport>
CreateCurlTransport(const std::string &proxy, const std::string &proxy_username, const std::string &proxy_password) {
Azure::Core::Http::CurlTransportOptions curl_transport_options;
Expand Down Expand Up @@ -410,6 +437,32 @@ GetDfsStorageAccountClientFromServicePrincipalProvider(optional_ptr<FileOpener>
return Azure::Storage::Files::DataLake::DataLakeServiceClient(account_url, token_credential, dfs_options);
}

static Azure::Storage::Blobs::BlobServiceClient
GetBlobStorageAccountClientFromAccessTokenProvider(optional_ptr<FileOpener> opener, const KeyValueSecret &secret,
const AzureParsedUrl &azure_parsed_url) {
auto transport_options = GetTransportOptions(opener, secret);
auto token_credential = CreateAccessTokenCredential(secret);

auto account_url =
azure_parsed_url.is_fully_qualified ? AccountUrl(azure_parsed_url) : AccountUrl(secret, DEFAULT_BLOB_ENDPOINT);
;
auto blob_options = ToBlobClientOptions(transport_options, GetHttpState(opener));
return Azure::Storage::Blobs::BlobServiceClient(account_url, token_credential, blob_options);
}

static Azure::Storage::Files::DataLake::DataLakeServiceClient
GetDfsStorageAccountClientFromAccessTokenProvider(optional_ptr<FileOpener> opener, const KeyValueSecret &secret,
const AzureParsedUrl &azure_parsed_url) {
auto transport_options = GetTransportOptions(opener, secret);
auto token_credential = CreateAccessTokenCredential(secret);

auto account_url =
azure_parsed_url.is_fully_qualified ? AccountUrl(azure_parsed_url) : AccountUrl(secret, DEFAULT_DFS_ENDPOINT);
;
auto dfs_options = ToDfsClientOptions(transport_options, GetHttpState(opener));
return Azure::Storage::Files::DataLake::DataLakeServiceClient(account_url, token_credential, dfs_options);
}

static Azure::Storage::Blobs::BlobServiceClient GetBlobStorageAccountClient(optional_ptr<FileOpener> opener,
const KeyValueSecret &secret,
const AzureParsedUrl &azure_parsed_url) {
Expand All @@ -421,6 +474,8 @@ static Azure::Storage::Blobs::BlobServiceClient GetBlobStorageAccountClient(opti
return GetBlobStorageAccountClientFromCredentialChainProvider(opener, secret, azure_parsed_url);
} else if (provider == "service_principal") {
return GetBlobStorageAccountClientFromServicePrincipalProvider(opener, secret, azure_parsed_url);
} else if (provider == "access_token") {
return GetBlobStorageAccountClientFromAccessTokenProvider(opener, secret, azure_parsed_url);
}

throw InvalidInputException("Unsupported provider type %s for azure", provider);
Expand All @@ -437,6 +492,8 @@ GetDfsStorageAccountClient(optional_ptr<FileOpener> opener, const KeyValueSecret
return GetDfsStorageAccountClientFromCredentialChainProvider(opener, secret, azure_parsed_url);
} else if (provider == "service_principal") {
return GetDfsStorageAccountClientFromServicePrincipalProvider(opener, secret, azure_parsed_url);
} else if (provider == "access_token") {
return GetDfsStorageAccountClientFromAccessTokenProvider(opener, secret, azure_parsed_url);
}

throw InvalidInputException("Unsupported provider type %s for azure", provider);
Expand Down
2 changes: 1 addition & 1 deletion src/http_state_policy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ const static std::string CONTENT_LENGTH = "content-length";

namespace duckdb {

HttpStatePolicy::HttpStatePolicy(shared_ptr<HTTPState> http_state) : http_state(std::move(http_state)) {
HttpStatePolicy::HttpStatePolicy(shared_ptr<AzureHTTPState> http_state) : http_state(std::move(http_state)) {
}

std::unique_ptr<Azure::Core::Http::RawResponse>
Expand Down
Loading
Loading