Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key pair auth not working for Snowflake #2192

Open
kning opened this issue Jan 6, 2025 · 4 comments
Open

Key pair auth not working for Snowflake #2192

kning opened this issue Jan 6, 2025 · 4 comments

Comments

@kning
Copy link
Contributor

kning commented Jan 6, 2025

dlt version

1.5.0

Describe the problem

I've followed the instructions here for setting up key-pair auth for Snowflake, but can't get it to work, I always get:

SerializationError: Failed to serialize exception Pipeline execution failed at stage sync with exception:

<class 'ValueError'>
('Could not deserialize key data. The data may be in an incorrect format, the provided password may be incorrect, it may be encrypted with
an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', [<OpenSSLError(code=503841036,
lib=60, reason=524556, reason_text=unsupported)>]) of type <class 'dlt.pipeline.exceptions.PipelineStepFailed'>: cannot pickle 
'cryptography.hazmat.bindings._rust.openssl.OpenSSLError' object

I've stripped the headers of the private key and put it in one line before running base64 -i rsa_key.p8 and then passing that value like "snowflake://loader:<password>@kgiotue-wn98412/dlt_data?private_key=<base64 encoded pem> and it still doesn't work.

I've confirmed that the key-pair I generated works fine by testing with snowsql, so my hunch is something is happening at the base64 encoding step or on dlt's side with how it parses out the private key from the connection url.

More context here:
https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1730941780777469

Expected behavior

We're trying to incorporate dlt into our production elt. Key-pair auth for service accounts like this is best practice so it should work reliably and easily.

Steps to reproduce

If you have a dev snowflake for testing try creating a key pair and plugging it into this simple script to see if it works:

import dlt

data = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]

pipeline = dlt.pipeline(
    pipeline_name="quick_start", destination=dlt.destinations.snowflake("snowflake://loader:<password>@kgiotue-wn98412/dlt_data?private_key=<base64 encoded pem>"), dataset_name="dev"
)
load_info = pipeline.run(data, table_name="users")

print(load_info)

Operating system

macOS

Runtime environment

Other

Python version

3.11

dlt data source

local

dlt destination

Snowflake

Other deployment details

Using Modal

Additional information

No response

@kning
Copy link
Contributor Author

kning commented Jan 6, 2025

more detailed stack trace:

  File "/Users/kennyning/modal-etl/venv/lib/python3.11/site-packages/dlt/destinations/impl/snowflake/configuration.py", line 39, in _decode_private_key
    pkey = serialization.load_pem_private_key(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kennyning/modal-etl/venv/lib/python3.11/site-packages/cryptography/hazmat/backends/openssl/backend.py", line 494, in _handle_key_loading_error
    raise ValueError(
ValueError: ('Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', [<OpenSSLError(code=503841036, lib=60, reason=524556, reason_text=unsupported)>])

seems like this line: https://github.com/dlt-hub/dlt/blob/devel/dlt/destinations/impl/snowflake/configuration.py#L32

@kning
Copy link
Contributor Author

kning commented Jan 7, 2025

Figured out the issue, here's what I did to get it to work:

  • Remove the -----BEGIN PRIVATE KEY----- and -----END PRIVATE KEY----- headers and newlines from your private key file with this command: cat rsa_key.p8 | sed '1d;$d' | tr -d '\n'
  • Don't base64 encode this value like the docs suggest
  • Replace the + signs with url encoding %2B
  • Pass this into your connection URL and it works

My understanding of what's happening

When dlt parses the raw SQLAlchemy url you pass it, it calls make_url a SQLAlchemy function. When I breakpointed it here, it looks like SQLAlchemy actually url encodes the string you pass in but skips + signs (my guess is because there are valid + characters in some drivers e.g. mssql+pyodbc.

It decodes this url in the next line implicitly when it calls url._asdict(). This decodes the + into a space so now your private key is malformed and the decoding fails.

Possible solutions

Option 1: Make this plus sign substitution for private key values before passing to make_url
This is hacky but at least works around having to adjust SQLAlchemy dependency

Option 2: Continue advising users to base64 encode their private key (this gets rid of the plus sign issue) before passing in to the connection string and call b64 decode twice.
Once to get back to the original private key value and another one to decode it to bytes.
I don't really like this solution because it seems like an unnecessary and error prone step to make the user run an additional terminal command to base64 encode their private key.

Option 3: Fix SQLAlchemy dependency to properly handle + character in private keys.
I don't know enough about how SQLAlchemy works to know how feasible this is. This seems like the best long term solution but will take the most time since it requires making a PR to SQLAlchemy.

Let me know if this makes sense or I missed something!

@sh-rp
Copy link
Collaborator

sh-rp commented Jan 8, 2025

@kning just to confirm before I investigate further: What is the exact line that you ran to encode your private key into base64 on the first try? You should be running base64 -i <path-to-private-key-file>.pem on a pem file not a p8 file (as it looks like you are working on above)

@kning
Copy link
Contributor Author

kning commented Jan 8, 2025

hey i tried doing pem file but still getting the same error. here are the commands i ran

  1. generate private key: openssl genrsa -out rsa_key.pem 2048
  2. generate public key: openssl rsa -in rsa_key.pem -pubout -out rsa_key.pub
  3. get trimmed public key: cat rsa_key.pub | sed '1d;$d' | tr -d '\n'
  4. added public to my user: alter user modal_user set rsa_public_key = "<output from above>"
  5. confirm i can connect to my account with this key pair: snowsql -a <account_id> -u modal_user --private-key-path rsa_key.pem
  6. save trimmed private key to local file cat rsa_key.pem | sed '1d;$d' | tr -d '\n' > clean_key.pem
  7. base64 encode private key: base64 -i clean_key.pem
  8. add that value to the snowflake script i pasted above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants