Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-6314: Fix unexpected output for empty grouping set #1117

Open
wants to merge 6 commits into
base: adb-6.x-dev
Choose a base branch
from

Conversation

mos65o2
Copy link

@mos65o2 mos65o2 commented Nov 12, 2024

Prohibiting grouping sets with ungrouped attributes in TargetList

Problem:
Queries with grouping expressions and ungrouped attributes in TL could
produce unexpected output if the group did not have a primary key.

Example with 3 columns where "a" is a primary key:
SELECT a,b,c FROM t GROUP BY GROUPING SETS((a),(b));

Grouping "(a)" had correct output of columns "a", "b", "c". Attribute "b" in
group "(b)" is not a primary key, so it is not unique, so it can be grouped.
In this case, the output of column "c" was undefined. This also applied to
"GROUPING SETS ((a),())", ROLLUP and CUBE (detailed behavior is
described in tests with ungrouped check).
Grouping expressions were introduced in PostgreSQL 9.5 (f3d3118) and
such queries were prohibited. And this patch aligns the behavior with that.

Solution:
Improve parse-stage validation for ungrouped columns when using grouping
expressions.

Changes:
Similar to PostgreSQL, the parsing stage for queries with grouping
expressions reads common attributes for all groups. If any, they are placed in
"groupClauseCommonVars". In case of ungrouped attributes in TL, this list is
used to check for PK in groups.
Compared to the original, the "groupClauses" list uses TargetEntry rather than
an expression. Added function to get common refsortgroupref in grouping
clauses. At the beginning, the function assumes that all expressions are found
in grouping clauses. By running through the groupings, missing expressions
are excluded from the common refs list. The extraction function in the form
"list_intersection_int" taken from f3d3118.
The patch does not change the behavior of regular GROUP BY.

Co-authored-by: anarazel [email protected]

@mos65o2 mos65o2 marked this pull request as ready for review November 18, 2024 13:45
Copy link

@red1452 red1452 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote, that changes are similar to Postgres 9.5, but I did not find the same sources there. Can you give the commit, which is example of your patch? Or can you backport such commit?

@whitehawk

This comment was marked as resolved.

@whitehawk

This comment was marked as resolved.

@mos65o2
Copy link
Author

mos65o2 commented Nov 19, 2024

You wrote, that changes are similar to Postgres 9.5, but I did not find the same sources there. Can you give the commit, which is example of your patch?

f3d3118 introduces support in PG 9.5. Also in GPDB7.

Or can you backport such commit?

I discussed this issue with @Stolb27. I only need to correct the parser behavior without porting. My changes only concern the ungrouped columns check. Whereas the patch in PostgreSQL makes changes to the planner and much more.

@red1452
Copy link

red1452 commented Nov 19, 2024

When I try the functional_deps solely (with optimizer enabled), it fails on my side.
It appears that it is due to GUC optimizer_enable_table_alias. If I set it to off in the test, the test passes. For ex with sample patch below:

diff --git a/src/test/regress/sql/functional_deps.sql b/src/test/regress/sql/functional_deps.sql
index 878a595259..5026b275f9 100644
--- a/src/test/regress/sql/functional_deps.sql
+++ b/src/test/regress/sql/functional_deps.sql
@@ -2,6 +2,9 @@
 
 -- Enable logs that allow to show GPORCA failed to produce a plan
 SET optimizer_trace_fallback = on;
+-- start_ignore
+SET optimizer_enable_table_alias = off;
+-- end_ignore
 
 CREATE TEMP TABLE articles (
     id int CONSTRAINT articles_pkey PRIMARY KEY,
@@ -429,3 +432,6 @@ DROP TABLE test_table3;
 
 RESET optimizer_trace_fallback;
 RESET optimizer;
+-- start_ignore
+RESET optimizer_enable_table_alias;
+-- end_ignore

I guess that the CI pipeline passes Ok, because some other test disables this GUC, and doesn't reset it back.

It is a separate issue (as it was presented before your patch). But I believe we need to keep tests as low coupled with each other, as possible. So it is better to fix this problem some day soon. Can you please re-check on your side and create a ticket for this problem?

Tests for GPDB6 is run with "optimizer_enable_table_alias=off". You do not need to set GUC in the test.

export OPTIMIZER_ENABLE_TABLE_ALIAS=off

@mos65o2
Copy link
Author

mos65o2 commented Nov 19, 2024

In the description:

"groping extensions"

groping -> grouping

And I guess not "extensions", but "expressions". Need to update it throughout the text.

Also, it is better to use past tense when describing previous wrong behaviour, and present tense for new/current behaviour.

I fixed the typo and the use of tense. I hope the meaning of "grouping expressions" and expressions (Vars) does not overlap. I will also correct the comment in the code.
Perhaps it would be worth mentioning "grouping expressions" in the title, rather than "grouping sets", since this affects ROLLUP and CUBE.

@mos65o2 mos65o2 requested a review from red1452 November 19, 2024 12:34
@whitehawk

This comment was marked as resolved.

@mos65o2
Copy link
Author

mos65o2 commented Nov 20, 2024

It is necessary to select common attributes among top-level groupings.
This case degenerates into a simple GROUP BY, despite the absence of common attributes among the top-level GROUPING SETS, so these queries work:
create table t (a int primary key, b int, c int, d int);

select a, b, c from t group by GROUPING SETS(a), GROUPING SETS (a,()); ->
select a, b, c from t group by a, GROUPING SETS (a,());

select a, b, c from t group by GROUPING SETS(a), GROUPING SETS (b,()); ->
select a, b, c from t group by a, GROUPING SETS (b,());

It is necessary to take into account nesting. This query also degenerates into a regular GROUP BY and works:
select a, b, c from t group by GROUPING SETS (GROUPING SETS ((a), GROUPING SETS ((grouping sets(a),a)), GROUPING SETS (b,());

This query does not work:
select a, b, c from t group by GROUPING SETS (GROUPING SETS(a), GROUPING SETS (GROUPING SETS(b),a)), GROUPING SETS (b,());

With a composite key it works differently (does not work):
create table t2 (a int, b text, c int, PRIMARY KEY(a,c));
select a, b, c from t2 group by GROUPING SETS (a,c), GROUPING SETS (a,())
All queries are tested in PG 9.5.

@mos65o2
Copy link
Author

mos65o2 commented Nov 20, 2024

Followins query:

create table funcdep1(a int primary key, b int, c int, d int);
explain (costs off) select a, b, c from funcdep1 group by GROUPING SETS(a), GROUPING SETS(a, ());

while on your branch it reports an error:

I have changed the behavior. Please check. Pay attention to the tests.

Copy link

@red1452 red1452 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add note about f3d3118 to the description.

src/backend/parser/parse_agg.c Outdated Show resolved Hide resolved
src/backend/parser/parse_agg.c Outdated Show resolved Hide resolved
src/backend/parser/parse_agg.c Outdated Show resolved Hide resolved
src/backend/parser/parse_agg.c Outdated Show resolved Hide resolved
src/backend/parser/parse_agg.c Outdated Show resolved Hide resolved
@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/86899

@arenadata arenadata deleted a comment from BenderArenadata Nov 26, 2024
@arenadata arenadata deleted a comment from BenderArenadata Nov 26, 2024
@arenadata arenadata deleted a comment from BenderArenadata Nov 26, 2024
@arenadata arenadata deleted a comment from BenderArenadata Nov 26, 2024
Modify get_com_grouping_ressortgroupref_routine.
Uses list_intersection_int instead of while cycle.
The function was taken from f3d3118.
Also code refactoring.
@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/87081

* after flatten_joinalias_vars. We also find out if there are grouping
* expressions.
* Getting ressortgrouprefs of common TargetEntrys in groupings.
* TargetEntry is common if it appears in several grouping expressions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is common if it appears in several grouping expressions
* (including nesting) and in a regular GROUP BY (general case)."

Not if

"is common if it appears in all grouping expressions
* (including nesting) or in a regular GROUP BY (general case)."

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then for the case of GROUPING SETS(a) I will point out that it degenerates into a regular GROUP BY.

{
List *result = NIL;

if ( !grpcl )
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra spaces: if (!grpcl)

List *result = NIL;

if ( !grpcl )
return result;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do return NIL;, doing return result is better only if it does not empty.

Assert(IsIntegerList(list1));
Assert(IsIntegerList(list2));

result = NIL;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List *result = NIL;

foreach(lc, tles)
{
List *chunk = get_com_grouping_ressortgroupref_routine((Node *)lfirst(lc), targetList, com_refs);
if(!chunk)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space

{
TargetEntry *tle = get_sortgroupclause_tle((SortGroupClause *) grpcl, targetList);
result = lappend_int(result, tle->ressortgroupref);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add return result; here, you may remove the last else with brackets and one tab in each line inside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants