Skip to content
This repository has been archived by the owner on Jun 7, 2021. It is now read-only.

[TRAFODION-3034] Support Oracle Hierarchy Query (Connect By) #1688

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

moscowgentalman
Copy link
Contributor

This PR is the first patch to support Oracle Hierarchy query feature (CONNECT BY).
In this PR, the feature is implemented as a new SQL utility, It is standalone, rather clear isolated with all other SQL functions, so the impact is minimal.
In the long run, we should finish the ANSI recursive feature (recursive WITH) , and at that time, this feature can be considered to refactor to use that infrastructure.
This is just the first phase of this feature.

The basic logic is simple: the utility will run a query to get all start values (specified by the START WITH clause), then it will construct queries to search for children of the root, and loop until no children can be found.

Oracle has 3 pseudo columns, to support the ISLEAF and CONNECT_BY_PATH, the utility will have to run a query for each parent, it will be rather slow. If a query doesn't have those two pseudo columns required, the utility will run in batch mode, for each iteration, get all children in one query. That will be much faster.

One can check the executor/TEST021 for how this feature works first.

This will be a long review process, there must be many places to be modified and enhanced, thank you all for help in advance.

@Traf-Jenkins
Copy link

Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2935/

}
}
}
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I will remove these dead code.

@Traf-Jenkins
Copy link

@Traf-Jenkins
Copy link

Previous Test Aborted. New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2936/

@Traf-Jenkins
Copy link

@moscowgentalman
Copy link
Contributor Author

@anoopsharma00 @zellerh @sureshsubbiah @selvaganesang could you help to take a review?

@moscowgentalman
Copy link
Contributor Author

I need a big rework of this patch. Please hold until I finish.
The current implemention takes assumption there is only one table in the from_clause, which is not correct.

@Traf-Jenkins
Copy link

Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2961/

@Traf-Jenkins
Copy link

@Traf-Jenkins
Copy link

New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2962/

@Traf-Jenkins
Copy link

@moscowgentalman
Copy link
Contributor Author

jenkins, retest

@Traf-Jenkins
Copy link

New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2965/

@Traf-Jenkins
Copy link

@Traf-Jenkins
Copy link

New Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2966/

@Traf-Jenkins
Copy link

@robertamarton
Copy link
Contributor

What are the security considerations for this features? Do standard grant privileges work find without changes? Could one possible example for using CONNECT BY it to traverse the privilege tree. Start with the grantor being the system and proceed to through the children (where grantee == grantor).

Copy link
Contributor

@sureshsubbiah sureshsubbiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am about 30% done with the review I think. It will likely take me another 2-3 days to complete it. I like the coding style and content very much. It is clean, easy to understand and flows well.

@@ -1515,6 +1515,9 @@ $1~String1 --------------------------------
8034 ZZZZZ 99999 ADVANCED MAJOR DBADMIN Column $0~String0 of object $1~string1 does not have a default clause but it is missing in database. This indicates inconsistent data.
8035 ZZZZZ 99999 ADVANCED MAJOR DBADMIN Truncation of hive table failed. $0~String0
8036 ZZZZZ 99999 ADVANCED MINOR LOGONLY Error while creating the error logging file or logging the error row to file $0~String0: Details :$1~String1
8037 ZZZZZ 99999 BEGINNER MAJOR DBADMIN Loop detected in connect by execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these three messages, it would help if we could give information specific to the instance when the error occurred. Maybe value of the prior column(s) as the error occurred. For the case of recursion exceeding a certain length or memory being exhausted, more details what level of recursion has been reached or how much memory has been consumed (if that information is readily available) would be helpful. This is an advisory suggestion, and by no means something to be addressed soon.

I would have expected some binder errors for statements where hierarchical constructs are not supported. It is possible though unlikely that all are caught in the parser. For example what happens if a hierarchical function or pseudocolumn is used a regular (non-hierarchical query)

{
setNodeType(ComTdb::ex_CONNECT_BY);
connTableName_ = tableName;
maxDeep_ = 200; //by default, max deep of a tree
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two literals could be consts to improve readability. It looks like there are cqds defined to make these configurable, which is excellent.

@@ -862,7 +863,7 @@ ParKeyWord ParKeyWords::keyWords_[] = {
ParKeyWord("PREPARE", TOK_PREPARE, ANS_|RESWORD_),
ParKeyWord("PRESERVE", TOK_PRESERVE, ANS_|RESWORD_),
ParKeyWord("PRIMARY", TOK_PRIMARY, FIRST_|ANS_|RESWORD_),
ParKeyWord("PRIOR", IDENTIFIER, ANS_|RESWORD_),
ParKeyWord("PRIOR", PRIOR_IDENTIFIER, ANS_),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why this is a PRIOR_IDENTIFIER and not TOK_PRIOR? Are we attempting to allow use of PRIOR as an identifier now. It was previously not allowed, since it was a RESWORD. Are we trying to have fewer reserved words? This would be an admirable goal. Thank you for explaining, this is not a defect.

@@ -13280,13 +13288,130 @@ table_expression : from_clause where_clause sample_clause
SqlParser_CurrentParser->topHasOlapFunctions());
SqlParser_CurrentParser->setTopHasTDFunctions(FALSE);
}
| from_clause startwith_clause where_clause
{
if($1->getOperatorType() == REL_JOIN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused on several aspects here. None maybe issues

  1. In line 13297 the WHERE clause is passed but not in line 13310. IF branch is taken if we have a Join. Why would the WHERE clause not be passed in for a single scan? Is it because it is already addressed in the preceding rule. If yes, what benefit do lines 13308 to 13318 provide?

  2. Are OLAP functions in the select list supported? Should sequence functions be supported too (since they are similar)?

  3. I wonder if this whole block of code could be expressed more concisely. There seems to be some redundancy now.

/* type relx */
from_clause : TOK_FROM global_hint table_reference { $$ = $3; }
| from_clause ',' table_reference
{
$$ = new (PARSERHEAP()) Join($1, $3, REL_JOIN);
}
}
startwith_clause :TOK_START_WITH search_condition CONNECT_IDENTIFIER TOK_BY search_condition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Oracle doc "In a hierarchical query, one expression in condition must be qualified with the PRIOR operator to refer to the parent row". Is this check done later?

@@ -1439,6 +1459,7 @@ class RelExpr : public ExprNode
enum Flags {
EXPAND_SHORT_ROWS = 0x00000001 // expand short rows when added columns
,PARENT_IS_ROOT = 0x00000002 // compressed internal format
,HAS_CONNECT_BY = 0x00000004 // compressed internal format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

) typo on the comment.
2) why do we need a bit flag on the RelExpr? Can we just check nullness of biConnectBy and other data members of this class? I was under the impression that we did not need bit flags in RelExpr data members in the compiler till be reached the generator. This could be an incorrect idea.
3) Can any RelExpr have a CONNECT_BY. I am assuming there are several RelExpr where this is not allowed/supported. We should add binder error for those that are not caught in the parser, if any.

@@ -4555,7 +4555,7 @@ void ValueIdSet::unparse(NAString &result,

NAString connectorText;

if ((form == MVINFO_FORMAT) || (form == QUERY_FORMAT))
if ((form == MVINFO_FORMAT) || (form == QUERY_FORMAT) || (form == CONNECT_BY_FORMAT))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the CONNECT_BY_FORMAT different from QUERY or MVINFO formats? The answer might be in other parts of code, its just that I cannot tell. Thank you for explaining

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONNECT_BY_FORMAT has only one difference from QUERY_FORMAT, that the column name only has the last part, not qualified.
The reason is: the column name will be used in the work() method of CONNECT_BY, where alias of the table name is lost. But we can make sure, there is only one table in that query in the work() method, so only the column name is required.

@@ -525,7 +540,7 @@ class RelExpr : public ExprNode
// QSTUFF

// --------------------------------------------------------------------
// normalizeNode() performs predicate pushdown and also ensures
// normalizeNode() performs predicate pushdown and also ensuresL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@@ -4138,6 +4138,7 @@ class ExExeUtilLobInfoTableTcb : public ExExeUtilTcb
DONE_
};
Step step_;
char * data_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a change in a Lob class?

@@ -4172,6 +4173,48 @@ class ExExeUtilLobInfoPrivateState : public ex_tcb_private_state
protected:
};

class connectByStackItem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is unusual in the executor to have classes that do not have a base class. How about memory management for this class. I might understand more as I see how these classes are used.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants