Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request : add support for Fabric OneLake #43

Closed
djouallah opened this issue Jul 2, 2024 · 16 comments · Fixed by #92
Closed

Feature request : add support for Fabric OneLake #43

djouallah opened this issue Jul 2, 2024 · 16 comments · Fixed by #92

Comments

@djouallah
Copy link

djouallah commented Jul 2, 2024

it seems Azure is already supported using SPN, we need only to pass

client_id
client_secret
tenant_id
storage_account="onelake"
endpoint_url="https://onelake.blob.fabric.microsoft.com"
@gdubya
Copy link
Contributor

gdubya commented Jul 31, 2024

I might have a chance to test / implement this soon.

Note to self: See object_store AzureConfigKey

@gdubya
Copy link
Contributor

gdubya commented Aug 15, 2024

I think this is blocked due to some bigger issues with the duckdb_azure plugin. i.e. duckdb/duckdb-azure#58

@djouallah
Copy link
Author

it working now, so closing the issue

@gdubya
Copy link
Contributor

gdubya commented Sep 12, 2024

This works again if we set the use_fabric_endpoint property and remove the unnecessary endpoint manipulation from https://github.com/duckdb/duckdb_delta/blob/main/src/functions/delta_scan.cpp#L348 (remove the "else").

@gdubya
Copy link
Contributor

gdubya commented Sep 12, 2024

@samansmink Any reason to keep that "else"? if the endpoint is empty then it will be handled by the underlying rust code anyway, i believe.

@gdubya
Copy link
Contributor

gdubya commented Sep 12, 2024

Also, i'm not sure if / how the duckdb org can set up an integration test to a Fabric workspace in the same way that they have the other Azure integration tests. Any suggestions @djouallah ?

@samansmink
Copy link
Collaborator

Any reason to keep that "else"

No strong opinion, it was added to simply set the endpoint to be the same as DuckDB is using.

Sorry if this is a stupid question, I have not played with fabric yet, but looking at https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api i would expect the following to work:

CREATE SECRET az (
    TYPE azure,
    CREDENTIAL CHAIN 'cli',
    ACCOUNT_NAME 'onelake',
    ENDPOINT 'https://onelake.dfs.fabric.microsoft.com'
)
from delta_scan('abfss://<workspace>/<item>.<itemtype>/<path>/<fileName>')

Also, i'm not sure if / how the duckdb org can set up an integration test to a Fabric workspace in the same way that they have the other Azure integration tests.

I guess I need to create a DuckDB testing fabric account and hook that up in our CI. I'm a little busy this week but I can look into that next week!

@gdubya
Copy link
Contributor

gdubya commented Sep 12, 2024

Sorry if this is a stupid question, I have not played with fabric yet, but looking at https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api i would expect the following to work:

CREATE SECRET az (
    TYPE azure,
    CREDENTIAL CHAIN 'cli',
    ACCOUNT_NAME 'onelake',
    ENDPOINT 'https://onelake.dfs.fabric.microsoft.com'
)
from delta_scan('abfss://<workspace>/<item>.<itemtype>/<path>/<fileName>')

I'm not sure, but I think that unless you tell delta-rs to use the fabric endpoint then it might still be using dfs.core.microsoft.com instead of dfs.fabric.microsoft.com somewhere?

I guess I need to create a DuckDB testing fabric account and hook that up in our CI. I'm a little busy this week but I can look into that next week!

👍🏻

@djouallah
Copy link
Author

it is working !!! it is so easy when you know it :)

@djouallah
Copy link
Author

twist, it works great inside fabric notebook but outside Fabric using my laptop, I get this errors, for reference, it works fine with polars and daft, so it is not an authentication issue

IOException: IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: 'abfss://[email protected]/storage.Lakehouse/Tables/test/'): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: Generic MicrosoftAzure error: Error performing list request: Client error with status 400 Bad Request: {"error":{"code":"IncorrectEndpointError","message":"Operation not supported on the specified endpoint"}})

@djouallah
Copy link
Author

@gdubya any idea what is this error

IOException                               Traceback (most recent call last)
[<ipython-input-15-bd66f66dbdac>](https://localhost:8080/#) in <cell line: 12>()
     10 """)
     11 duckdb.sql(" force install delta from core_nightly ")
---> 12 xxxx = duckdb.sql(f""" select *  from delta_scan('{Table_Path}') limit 1""").show()

IOException: IO Error: AzureBlobStorageFileSystem could not open file: 'abfss://[[email protected]](mailto:[email protected])/storage.Lakehouse/Tables/newdemo/part-00001-73b72328-3c27-4632-a4fa-0b73cc055eb2-c000.snappy.parquet', unknown error occurred, this could mean the credentials used were wrong. Original error message: 'Fail to get a new connection for: https://onelake.blob.fabric.microsoft.com/. Problem with the SSL CA cert (path? access rights?)'

@gdubya
Copy link
Contributor

gdubya commented Sep 14, 2024

How did you create the secret?

@gdubya
Copy link
Contributor

gdubya commented Sep 14, 2024

That appears to be correct. Hmm, I'm not sure, sorry. I'll have to test it later this evening.

@djouallah
Copy link
Author

don't ask got it working

!mkdir -p /etc/pki/tls/certs
!ln -s /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt

@djouallah
Copy link
Author

@gdubya and @samansmink thanks, that was a colab specific issue, it works fine in my windows machine

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants