-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regex parser for parsing functions having SQL body with language sql (PG 15 feature) #2201
base: main
Are you sure you want to change the base?
Conversation
…guage sql (PG15 feature)
yb-voyager/cmd/analyzeSchema.go
Outdated
} else if strings.Contains(currLine, "BEGIN ATOMIC") { | ||
dollarQuoteFlag = 1 //denotes start of the sql body part https://www.postgresql.org/docs/15/sql-createfunction.html#:~:text=a%20new%20session.-,sql_body,-The%20body%20of | ||
codeBlockDelimiter = "END" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current code will not be covering the below cases:
- There are chances of
BEGIN ATOMIC
orEND
being on same line, different line, something on same line and rest on the next line.
CREATE OR REPLACE FUNCTION subtract_and_log(a INT, b INT)
RETURNS INT
LANGUAGE SQL
BEGIN ATOMIC INSERT INTO test_log(action) VALUES ('Subtracting numbers'); RETURN a - b; END;
-
Posibility of different casing(smallcase, mixed case) used for
BEGIN ATOMIC ... END;
-
Using regexp here would be right way to go here, given there are N number of posibilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are chances of BEGIN ATOMIC or END being on same line, different line, something on same line and rest on the next line.
This automatically works with the current code, added unit test case for that.
Posibility of different casing(smallcase, mixed case) used for BEGIN ATOMIC ... END;
Fixed the casing by using the BEGIN ATOMIC with regexp but END with string lower upper both
Not trying to cover a lot of cases given that we should use the pg-parser anyways but having this meanwhile to atleast able to detect the SQL body case for this PR #2165
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt we are going to have pg-parser soon for breaking the sqlfile into sql statements.
Also for oracle cases where syntax is off, (not fully converted to PG), there the parser will fail, so we could either use current approach or mix of both.
I would say lets cover most of it if we can with small efforts.
Since this SQL body thing is a part of DDL, and our import schema can also break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the SQL BODY, I have tried to cover most of the cases on the PG docs https://www.postgresql.org/docs/11/sql-createfunction.html (fixed one panic in IsEndOFSQL function). The only todo is that I am using the ToLOWER and END for codeDelimiter for SQL where we could use regex but should be okay I think.
do you have any other case in mind that I might be missing?
5595c10
to
f4e04a9
Compare
a7abfab
to
ff7fb07
Compare
@@ -134,6 +134,7 @@ var ( | |||
parserIssueDetector = queryissue.NewParserIssueDetector() | |||
multiRegex = regexp.MustCompile(`([a-zA-Z0-9_\.]+[,|;])`) | |||
dollarQuoteRegex = regexp.MustCompile(`(\$.*\$)`) | |||
sqlBodyBeginRegex = re("BEGIN", "ATOMIC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put a ^
in the regex to ensure that BEGIN ATOMIC
is at the start of line?
Do we need to ensure that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to ensure that as it could be anywhere not necessarily in starting of the line e.g.
CREATE FUNCTION public.asterisksdfsf(n integer) RETURNS SETOF text
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE BEGIN ATOMIC
SELECT repeat('*'::text, g.g) AS repeat
FROM generate_series(1, asterisksdfsf.n) g(g);
END;
yb-voyager/cmd/analyzeSchema.go
Outdated
if strings.Contains(currLine, codeBlockDelimiter) || | ||
strings.Contains(currLine, strings.ToLower(codeBlockDelimiter)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concern here is that we don't know the variety of cases possible with plpgsql, since it can almost have anything inside the body. Here the condition is getting changed to match with either same or lowercase variant of it.
Should we just handle the sqlbody func case separately(by that i mean having different if conditions here for that) and not update the existing ones.
Just want to avoid any unknowns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the condition is getting changed to match with either same or lowercase variant of it.
the same or lowercase variant is only of the codeBlockDelimiter
variable which could be dollarQuoteRegex = regexp.MustCompile((\$.*\$)
) match or the END string for sqlBody.
But okay, I get your point about any unknowns with lowercase variant of the dollarQuoteRegex case, I can do that in separate if condition for just END case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if strings.Contains(currLine, codeBlockDelimiter) || | ||
strings.Contains(currLine, strings.ToLower(codeBlockDelimiter)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but can we use strings.Equalfold()
here to compare both and dont care about casing(covering mixed case scenario also).
Describe the changes in this pull request
SQL body syntax - https://www.postgresql.org/docs/15/sql-createfunction.html#:~:text=a%20new%20session.-,sql_body,-The%20body%20of, where with language SQL
or
Describe if there are any user-facing changes
N/A
How was this pull request tested?
unit tests added and end-to end tests added in other PR - #2165
Does your PR have changes that can cause upgrade issues?