Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1825608: does the CLIENT_SESSION_KEEP_ALIVE prevent an OAuth token from expiring #550

Closed
copdips opened this issue Nov 25, 2024 · 6 comments
Assignees
Labels
question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team

Comments

@copdips
Copy link

copdips commented Nov 25, 2024

What is the current behavior?

I have an one hour life time oauth token, and I create the engine by engine = Engine(URL(..., token=ouath_token, "authenticator": "oauth")), where URL is from snowflake.sqlalchemy import URL

It seems that as long as I don't dispose the engine, I can always query Snowflake use the same engine object.

What is the desired behavior?

Snowflake should refuse to process queries after 1 hour.

How would this improve snowflake-connector-python?

First, could you please confirm whether it's CLIENT_SESSION_KEEP_ALIVE that prevents an OAuth token from expiring or if the token does expire but the user session remains valid ?

Additionally, is CLIENT_SESSION_KEEP_ALIVE set to True by default, as I dont't specifiy it during the engine creation.

By the way, could you please also confirm whether the keep alive signal is sent as soon as the engine is created by engine = Engine(URL(...)), or if it only starts after calling the engine.connect(), and when it stops to send the signal ? Thanks.

@github-actions github-actions bot changed the title does the CLIENT_SESSION_KEEP_ALIVE prevent an OAuth token from expiring SNOW-1825608: does the CLIENT_SESSION_KEEP_ALIVE prevent an OAuth token from expiring Nov 25, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Nov 26, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added question Issue is a usage/other question rather than a bug status-triage Issue is under initial triage and removed feature labels Nov 26, 2024
@sfc-gh-dszmolka
Copy link
Contributor

hi, thanks for raising this question. From an initial review, it looks to be working as intended.

  1. OAuth token is used only at authentication time; after you're authenticated to Snowflake, then your client application (in this case, Snowflake Python driver, which powers snowflake-sqlalchemy) gets an authenticated token for the connection which token is valid for X amount of time, and is used for submitting queries to Snowflake.
    So the user session should remain valid, even after 1 hours.

X is by default 4 hours, but enter session policies: by configuring session policies, you can define a more customized lifetime validity for your sessions, e.g. you can consider them invalid even after 10 minutes if you wish. Or 1 hour.

  1. CLIENT_SESSION_KEEP_ALIVE is by default False

  2. When is the keep alive signal (we call it heartbeat) sent: connection creation time + CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY, which is by default 1 hour. So if you connect at 12:00:00 UTC, then you should expect the first heartbeat (call to /session/heartbeat by the Python driver) at around 13:00:00 UTC. You can configure the frequency between 900s (15m) - 3600s (1h, default)

Hope this answers the questions.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage labels Nov 26, 2024
@copdips
Copy link
Author

copdips commented Nov 26, 2024

@sfc-gh-dszmolka, thanks for your reply !

I still have a few questions about this topic:

  1. Does this mean that with all the default settings, a user must generate a new OAuth token after 4 hours and recreate the engine using engine = Engine(URL(..., token=oauth_token, "authenticator": "oauth"))?

  2. If yes, based on my tests, I was still able to query Snowflake with the initial engine object even after 5 hours. Furthermore, since CLIENT_SESSION_KEEP_ALIVE defaults to False, how was the query still able to executed? Could you clarify this behavior? Perhaps I made a mistake in my tests, I’ll conduct additional tests in the coming days.

  3. When exactly does the heartbeat start being sent? One of my use cases involves using this engine inside an API. I create the engine object during API startup, but the engine.connect() call happens in each API endpoint, which might occur 10 hours or more after the API startup. I’d like to understand when the heartbeat starts and stops being sent in this scenario. Does it begin right after engine = Engine(URL(...)) or after engine.conenct()

  4. If we use plain text username and password during engine creation, does it also generate an authenticated token underneath, or does it send the original password with each query?

  5. Do you have any best practices for refreshing the OAuth2 token?

Here’s a bit of background:
We’re developing a Snowflake utility that uses Azure Service Principal (client ID and client secret) to generate an OAuth2 token with a 65-minute lifetime. The credential object (from azure.identity import ClientSecretCredential) provides a get_token() method to renew the token if it has expired, or to return the current token if it is still valid. My goal is to integrate this functionality with Snowflake SQLAlchemy in such a way that users of our utility won’t need to worry about token refreshes. The utility should handle everything internally.

@sfc-gh-dszmolka
Copy link
Contributor

Thank you for providing the background here. Looks like this Issue is starting to be a bit more than an actual bug or enhancement request filed against the snowflake-sqlalchemy library, and as I see it, it doesn't even deal with this library but more likely designing a solution based on OAuth.
Which is not a problem, but maybe not the best forum to get all the infos here. But let's see what I got for you.

  1. answer is 'no', and 'they are unrelated'. OAuth token has its own lifetime, usually as best practice it's very short like 10m, and can be renewed (usually calling /renew-token or similar endpoint on the IdP) to provide another OAuth token with the same -short- validity.
    The OAuth token, again, is only used upon authentication and not afterwards. So when the user creates a new (additional) connection; they should not be able to use the same OAuth token past the OAuth token lifetime.

See more on how OAuth in Snowflake works:

This is not really related to this library.

  1. answer was 'no' ;) but I don't think this scenario should be possible. What I would recommend is to
  • enable DEBUG logging for the Python connector by adding the necessary addition to your code
  • repeat your test
  • check python_connector.log (in same directory where the program is executed) and confirm that
    • absolutely no queries were sent between the first query and the next query which you issue after 5 hours
    • absolutely no /session/heartbeat requests were sent during this 5 hour of total inactivity
      If no queries are sent, no heartbeat is sent (CLIENT_SESSION_KEEP_ALIVE=False), and you did not configure a session policy which would allow for a session to be authenticated for 5 hours or more, then please file a new issue with the reproduction steps because this is unexpected

(or, if CLIENT_SESSION_KEEP_ALIVE=False yet the connector still sending /session/heartbeat requests every hour, also file a new issue please because that is unexpected)

  1. engine.connect() + 1 hour (with default or unspecified CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY) (reference)

  2. as mentioned previously, the authentication method (e.g. user/pass, keypair, SSO, OAuth, etc.) and the authentication token are two independent things (i had a typo in my previous comment. it's not authenticatED token but authenticatiON token)

  • authentication method: might or might not involve a token (keypair = JWT, OAuth), which is different for each token-having authentication method and specific to that particular token-having authentication method.
    This token is either generated by the client itself based on well-documented algorithms (JWT) or provided by an external identity provider (IdP, e.g. OAuth Azure AD)
    By default, these tokens are and should be short lived (e.g. it's normal to have a 1-minute valid JWT, a 10-minute valid OAuth token, etc), because they are very sensitive information and can be used to authenticate on behalf of a user.

Thus a 1-hour valid OAuth token you mentioned is not recommended from a security best practices perspective, but it's up to you really.

  • authentication token (~ id token, session token, etc): this is not generated by the client, nor is provided by an external IdP. It is provided by Snowflake (global services layer) after the client successfully authenticates itself somehow, regardless how. Even with password.
    By default, valid for 4 hour. Can be now customized with session policies; linked the reference in the previous comment.

When the Snowflake authentication token expires, then you'll need to reauthenticate.(somehow).

  1. please see 1

At this point, considering you building a new solution, if your company is a Snowflake customer (or the company who contracted you or your company), I highly recommend considering reaching out to your Snowflake Account Team and asking for a design and implementation consultation.
This is a repository for handling issues and enhancements related to the snowflake-sqlalchemy library, and what you require is , as you see, is way beyond that. I'm trying my best to give guidance but it's by no means comprehensive and is more like a general guidance.

That's why I recommend involving people from Snowflake who can provide closer and more customized guidance; and your account team can help in that.

If you don't have any further questions, perhaps we can consider closing this Issue. If you do, let me know and I'll try my best to help.

@copdips
Copy link
Author

copdips commented Nov 27, 2024

Thank you for all the clarifications so far!

Before the issue is closed, I have one last question please 🙏 :

I’m not building a utility in parallel to snowflake-sqlalchemy but rather on top of it. My utility is a high-level wrapper around snowflake-sqlalchemy, providing interfaces like get_marketing_data(), get_sales_data(), etc., for internal company usage, with snowflake-sqlalchemy working behind the scenes.

By default, once the Snowflake-specific authentication token expires after 4 hours, reauthentication is required. Assuming I have a new valid Azure OAuth token, what is the recommended way to provide this new token to snowflake-sqlalchemy?

Should I:

  • recreate a new engine object with the new OAuth token, like engine = Engine(URL(..., token=ouath_token, "authenticator": "oauth")), this is straightforward but requires recreating the engine from scratch.
  • or does the existing engine object have a method to accept the new OAuth token and handle the reauthentication process ? maybe with self._oauth_token ? Currently, I'm exploring a solution using SQLAlchemy's event listener. It seems feasible, but I’m still unsure how to reset the OAuth token when the relevant event is triggered.

@sfc-gh-dszmolka
Copy link
Contributor

after a brief consultation with my colleagues, they advised that we saw users can define a pool engine which always has a valid connection, and recreates said connection if it expires. engine created something like this, with the pool:

example_engine = create_engine(
    URL(account=...), #usual stuff
    pool_size=10,              # The size of the pool to maintain
    max_overflow=20,           # The number of connections to allow beyond pool_size
    pool_timeout=30,           # How long to wait before giving up on getting a connection from the pool
    pool_recycle=3600,         # How long to keep connections before recycling (in seconds)
    echo=True                  # Enable logging of all SQL commands
)

then using do_connect event to modify how connection parameters are created

@event.listens_for(example_engine, 'do_connect')
def receive_do_connect(dialect, conn_rec, cargs, cparams):
    cparams["token"] = get_token()

Hope this helps providing inspiration for the implementation

@copdips
Copy link
Author

copdips commented Nov 27, 2024

Great, thanks a lot for the help! I think also using an event listener should be the final solution.

I'm closing this issue and will drop a comment if anything doesn't work.

@copdips copdips closed this as completed Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants