-
Notifications
You must be signed in to change notification settings - Fork 152
[TRAFODION-3034] Support Oracle Hierarchy Query (Connect By) #1688
base: master
Are you sure you want to change the base?
Conversation
…N-3034 Conflicts: core/sql/generator/GenRelExeUtil.cpp core/sql/optimizer/RelExeUtil.h core/sql/sqlcomp/DefaultConstants.h
Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2935/ |
core/sql/optimizer/BindRelExpr.cpp
Outdated
} | ||
} | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I will remove these dead code.
Test Failed. https://jenkins.esgyn.com/job/Check-PR-master/2935/ |
Previous Test Aborted. New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2936/ |
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/2936/ |
@anoopsharma00 @zellerh @sureshsubbiah @selvaganesang could you help to take a review? |
I need a big rework of this patch. Please hold until I finish. |
Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2961/ |
Test Failed. https://jenkins.esgyn.com/job/Check-PR-master/2961/ |
New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2962/ |
Test Failed. https://jenkins.esgyn.com/job/Check-PR-master/2962/ |
jenkins, retest |
New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2965/ |
Test Failed. https://jenkins.esgyn.com/job/Check-PR-master/2965/ |
New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2966/ |
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/2966/ |
What are the security considerations for this features? Do standard grant privileges work find without changes? Could one possible example for using CONNECT BY it to traverse the privilege tree. Start with the grantor being the system and proceed to through the children (where grantee == grantor). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am about 30% done with the review I think. It will likely take me another 2-3 days to complete it. I like the coding style and content very much. It is clean, easy to understand and flows well.
@@ -1515,6 +1515,9 @@ $1~String1 -------------------------------- | |||
8034 ZZZZZ 99999 ADVANCED MAJOR DBADMIN Column $0~String0 of object $1~string1 does not have a default clause but it is missing in database. This indicates inconsistent data. | |||
8035 ZZZZZ 99999 ADVANCED MAJOR DBADMIN Truncation of hive table failed. $0~String0 | |||
8036 ZZZZZ 99999 ADVANCED MINOR LOGONLY Error while creating the error logging file or logging the error row to file $0~String0: Details :$1~String1 | |||
8037 ZZZZZ 99999 BEGINNER MAJOR DBADMIN Loop detected in connect by execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In these three messages, it would help if we could give information specific to the instance when the error occurred. Maybe value of the prior column(s) as the error occurred. For the case of recursion exceeding a certain length or memory being exhausted, more details what level of recursion has been reached or how much memory has been consumed (if that information is readily available) would be helpful. This is an advisory suggestion, and by no means something to be addressed soon.
I would have expected some binder errors for statements where hierarchical constructs are not supported. It is possible though unlikely that all are caught in the parser. For example what happens if a hierarchical function or pseudocolumn is used a regular (non-hierarchical query)
{ | ||
setNodeType(ComTdb::ex_CONNECT_BY); | ||
connTableName_ = tableName; | ||
maxDeep_ = 200; //by default, max deep of a tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two literals could be consts to improve readability. It looks like there are cqds defined to make these configurable, which is excellent.
@@ -862,7 +863,7 @@ ParKeyWord ParKeyWords::keyWords_[] = { | |||
ParKeyWord("PREPARE", TOK_PREPARE, ANS_|RESWORD_), | |||
ParKeyWord("PRESERVE", TOK_PRESERVE, ANS_|RESWORD_), | |||
ParKeyWord("PRIMARY", TOK_PRIMARY, FIRST_|ANS_|RESWORD_), | |||
ParKeyWord("PRIOR", IDENTIFIER, ANS_|RESWORD_), | |||
ParKeyWord("PRIOR", PRIOR_IDENTIFIER, ANS_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain why this is a PRIOR_IDENTIFIER and not TOK_PRIOR? Are we attempting to allow use of PRIOR as an identifier now. It was previously not allowed, since it was a RESWORD. Are we trying to have fewer reserved words? This would be an admirable goal. Thank you for explaining, this is not a defect.
@@ -13280,13 +13288,130 @@ table_expression : from_clause where_clause sample_clause | |||
SqlParser_CurrentParser->topHasOlapFunctions()); | |||
SqlParser_CurrentParser->setTopHasTDFunctions(FALSE); | |||
} | |||
| from_clause startwith_clause where_clause | |||
{ | |||
if($1->getOperatorType() == REL_JOIN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused on several aspects here. None maybe issues
-
In line 13297 the WHERE clause is passed but not in line 13310. IF branch is taken if we have a Join. Why would the WHERE clause not be passed in for a single scan? Is it because it is already addressed in the preceding rule. If yes, what benefit do lines 13308 to 13318 provide?
-
Are OLAP functions in the select list supported? Should sequence functions be supported too (since they are similar)?
-
I wonder if this whole block of code could be expressed more concisely. There seems to be some redundancy now.
/* type relx */ | ||
from_clause : TOK_FROM global_hint table_reference { $$ = $3; } | ||
| from_clause ',' table_reference | ||
{ | ||
$$ = new (PARSERHEAP()) Join($1, $3, REL_JOIN); | ||
} | ||
} | ||
startwith_clause :TOK_START_WITH search_condition CONNECT_IDENTIFIER TOK_BY search_condition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Oracle doc "In a hierarchical query, one expression in condition must be qualified with the PRIOR operator to refer to the parent row". Is this check done later?
@@ -1439,6 +1459,7 @@ class RelExpr : public ExprNode | |||
enum Flags { | |||
EXPAND_SHORT_ROWS = 0x00000001 // expand short rows when added columns | |||
,PARENT_IS_ROOT = 0x00000002 // compressed internal format | |||
,HAS_CONNECT_BY = 0x00000004 // compressed internal format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) typo on the comment.
2) why do we need a bit flag on the RelExpr? Can we just check nullness of biConnectBy and other data members of this class? I was under the impression that we did not need bit flags in RelExpr data members in the compiler till be reached the generator. This could be an incorrect idea.
3) Can any RelExpr have a CONNECT_BY. I am assuming there are several RelExpr where this is not allowed/supported. We should add binder error for those that are not caught in the parser, if any.
@@ -4555,7 +4555,7 @@ void ValueIdSet::unparse(NAString &result, | |||
|
|||
NAString connectorText; | |||
|
|||
if ((form == MVINFO_FORMAT) || (form == QUERY_FORMAT)) | |||
if ((form == MVINFO_FORMAT) || (form == QUERY_FORMAT) || (form == CONNECT_BY_FORMAT)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the CONNECT_BY_FORMAT different from QUERY or MVINFO formats? The answer might be in other parts of code, its just that I cannot tell. Thank you for explaining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONNECT_BY_FORMAT has only one difference from QUERY_FORMAT, that the column name only has the last part, not qualified.
The reason is: the column name will be used in the work() method of CONNECT_BY, where alias of the table name is lost. But we can make sure, there is only one table in that query in the work() method, so only the column name is required.
@@ -525,7 +540,7 @@ class RelExpr : public ExprNode | |||
// QSTUFF | |||
|
|||
// -------------------------------------------------------------------- | |||
// normalizeNode() performs predicate pushdown and also ensures | |||
// normalizeNode() performs predicate pushdown and also ensuresL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
@@ -4138,6 +4138,7 @@ class ExExeUtilLobInfoTableTcb : public ExExeUtilTcb | |||
DONE_ | |||
}; | |||
Step step_; | |||
char * data_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a change in a Lob class?
@@ -4172,6 +4173,48 @@ class ExExeUtilLobInfoPrivateState : public ex_tcb_private_state | |||
protected: | |||
}; | |||
|
|||
class connectByStackItem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is unusual in the executor to have classes that do not have a base class. How about memory management for this class. I might understand more as I see how these classes are used.
This PR is the first patch to support Oracle Hierarchy query feature (CONNECT BY).
In this PR, the feature is implemented as a new SQL utility, It is standalone, rather clear isolated with all other SQL functions, so the impact is minimal.
In the long run, we should finish the ANSI recursive feature (recursive WITH) , and at that time, this feature can be considered to refactor to use that infrastructure.
This is just the first phase of this feature.
The basic logic is simple: the utility will run a query to get all start values (specified by the START WITH clause), then it will construct queries to search for children of the root, and loop until no children can be found.
Oracle has 3 pseudo columns, to support the ISLEAF and CONNECT_BY_PATH, the utility will have to run a query for each parent, it will be rather slow. If a query doesn't have those two pseudo columns required, the utility will run in batch mode, for each iteration, get all children in one query. That will be much faster.
One can check the executor/TEST021 for how this feature works first.
This will be a long review process, there must be many places to be modified and enhanced, thank you all for help in advance.