Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1432019 Calculate subtree query complexity #1657
SNOW-1432019 Calculate subtree query complexity #1657
Changes from 13 commits
e0ff34d
7f714e3
b279769
4582bb2
7abd06c
1e3a34c
aad17ee
46d8b9f
d38b3a4
88c7e7c
fb68aa0
8da7ffd
8474b5e
f72ce8c
310fbd2
e79f277
d5451ce
3b639bb
909ed67
ed83869
026e428
df9bfac
ced5d78
995f107
2b2f8a5
7f636e6
e40129d
8607aa4
3830820
68855ac
40fc65d
1299ee9
4a901f4
a555e1d
3008878
4bd8ce6
9035a70
b12b7f6
3f6c997
0c1c9da
8a03068
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also have this type annotation in one util file and we can import from that place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move it to a util file - that's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this compare with a simple map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as simple map but it also lets you do map1 + map2
example:
map1 = {a: 1, b: 2}
map2 = {b:1, c:3}
map1 + map2 = {a: 1, b: 3, c: 3}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, if map1 + map2 is the only difference, can we simply use map then? Right now the returned type is Counter[str] which is not straight forward to understand with a str, for example, why complexity is represented as a string? to reduce unnecessary confusion in the code, let's use more obvious data structure for representation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am looking at this individual_complexity_stat usage, it seems really just about what is the current plan node category, and there is no need to have a map there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a single value in my initial implementation but some expressions generate sql in multiple categories like
Interval
,TableFunctionJoin
,Lateral
. What I can change here is to do something like thisThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would prefer
score
instead ofstat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am renaming this to
individual_node_complexity
andcumulative_node_complexity
. since the return value is a dictionary, addingscore
could also be misleading.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so when i look at those, those are just plan_statistic or plan_info to me, but
I am fine with node_complexity if preferred. score is definitely misleading since we are not really calculating any score here yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I earlier had stat as a short hand for statistic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not it is not an 'estimate', it is really the state calculation, maybe just call it 'state'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am little bit confused here, that is for join, right? shouldn't the complexity for join is individual_complexity_stat(left) + individual_complexity_stat(righ) + self?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the self component. A join statement can be constructed in different ways. Based on how analyzer generates the final sql, we calculate the contribution. You can see check out analyzer code here:
snowpark-python/src/snowflake/snowpark/_internal/analyzer/analyzer.py
Lines 845 to 868 in 4ff6d42
and here:
snowpark-python/src/snowflake/snowpark/_internal/analyzer/analyzer_utils.py
Lines 714 to 741 in 4ff6d42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think I am still confused here. The join seems a LogicalPlan node, if we do not plan to do it at the plan node level, where do we plan to calculate the accumulated complexity for join then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of the category are very clear like filter pivot, but some are not so clear like column, low impact, function, maybe add some comment there about what does those category includes, for example, all snowflake functions, table functions are counted under function category etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we just add one category NO_IMPACT, count how many node we categorized as no impact, then essentially the total number of plan nodes is just the addition of all of those category
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, if we just call it query plan node category, we can probably just call it "others" etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you want to have both "LOW_IMPACT" and "OTHERS"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can have both, then we have a category for each node in the plan tree node, it would be easy for us to calculate the total number of nodes in the tree