Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-643388: Are CTEs supported? #431

Closed
KenRoytman opened this issue Aug 11, 2022 · 4 comments
Closed

SNOW-643388: Are CTEs supported? #431

KenRoytman opened this issue Aug 11, 2022 · 4 comments
Labels
feature New feature or request

Comments

@KenRoytman
Copy link

What is the current behavior?

Using the same dataframe in a self-join results in generated SQL that has multiple copies of the same sub-query.

What is the desired behavior?

It seems beneficial to have such dataframe operations use CTEs when generating the SQL.

How would this improve snowflake-snowpark-python?

More efficient query execution.

References, Other Background

I see that the dataframe api offers the cache_result() function. However, that makes use of temp tables, and requires the end-user to explicitly call cache_result() (rather than the library figuring out when to use CTEs).

@KenRoytman KenRoytman added the feature New feature or request label Aug 11, 2022
@github-actions github-actions bot changed the title Are CTEs supported? SNOW-643388: Are CTEs supported? Aug 11, 2022
@sfc-gh-sfan
Copy link
Collaborator

Hi Ken, thanks for the feedback. Today extracting common subqueries into CTE is not supported. This is challenging to get right and requires some designs/thoughts, but it is definitely a nice optimization.

@KenRoytman
Copy link
Author

KenRoytman commented Dec 1, 2022

Thanks for the reply. Is a simpler first step to have an API that would allow the user to specify which .select() calls should be treated as CTEs?

@sfc-gh-sfan
Copy link
Collaborator

Unfortunately I don't think we have that yet. It might be useful to look into cache_result if certain operations need to be executed multiple times.

@sfc-gh-jdu
Copy link
Collaborator

CTE is supported now by automatically converting duplicate subqueries to CTEs. You can enable this feature by setting session.cte_optimization_enabled = True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants