Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL dialect for DB2 for IBM i #658

Merged

Conversation

chrjorgensen
Copy link
Contributor

Hello.

We use your great SQL formatter utility in our Db2 for IBM i tools extension, but we find that the DB2 dialect in the formatter is for DB2 for Z (mainframe), whereas our extension is for IBM i on IBM Power server. There is quite some differences between the two platforms, which causes some problems in the SQL formatter.

So we offer this PR to add a different DB2 dialect targeted for IBM i - so we and other users of your extension can select this platform instead of IBM Z.

I'm a newbie to your code, but I have tried to copy everything Db2 related into new files for Db2i. Please verify carefully and let me know, if I missed something.

Thanks.

@chrjorgensen
Copy link
Contributor Author

It seems like the tests fails when run for the new Db2i dialect - what have I missed?

image

etc...

Copy link
Collaborator

@nene nene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Pointed out a few issues that popped up in my first quick glance.

You should keep in mind that the original DB2 implementation is likely seriously problematic. I have no idea how many people actually use it, if any. It was introduced during some very early days of SQL Formatter development and has remained largely unchanged. I Personally have never used any version of DB2.

src/languages/db2i/db2i.formatter.ts Outdated Show resolved Hide resolved
src/languages/db2i/db2i.formatter.ts Outdated Show resolved Hide resolved
src/languages/db2i/db2i.formatter.ts Show resolved Hide resolved
@nene
Copy link
Collaborator

nene commented Nov 8, 2023

I'm unsure whether having two DB2 dialects is worth it. The fact that the existing dialect targets the mainframe version is most likely just the result of googling DB2 syntax and picking the first result which happened to be the mainframe version.

I would guess that there are less people interested in using this tool with the mainframe version... but really I have no idea.

@chrjorgensen
Copy link
Contributor Author

Thanks for showing me where I made mistakes, I will fix them asap.

I didn't know about the state of the existing DB2 dialect, assumed it was okay and copied it. I'm also not familiar how the formatter works and the rules it applies, but I learn along the way.

FYI, there are three DB2 versions available: DB2 for Z (mainframe), DB2 for IBM i and DB2 for LUW (Linux, Unix, Windows). They are not the same (even though they share some codebase) and each have special functions and keywords...

@nene
Copy link
Collaborator

nene commented Nov 9, 2023

Looking through the few past issues that relate to DB2 I see the following:

The current implementation of DB2 is really a mess. I have taken some stuff (like list of keywords) from the Z-version and some other stuff (like identifier syntax) from the LUW-version.

I'm currently inclining towards throwing the existing db2 implementation away and just keeping the i-version. At least I know that somebody is actively using it, and the differences between these DB2 versions for now are really minimal.

Just FYI. This doesn't really effect this PR.

@@ -42,43 +42,43 @@ export default function supportsJoin(
const result = format(`
SELECT * FROM customers
${join} orders ON customers.customer_id = orders.customer_id
${join} items ON items.id = orders.id;
${join} items ON items.col1 = orders.col1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really the case that unquoted id can't be used as a column name in DB2 i?

When I look at the reserved word lists of LUW-version and Z-version the ID indeed doesn't appear in reserved words list.

However I still find it surprising that a) one would need to quote such a commonly used column name b) there's such a striking difference between the i-version of DB2 from other DB2 dialects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reserved words list for DB2 for IBM i shows ID as a reserved word - but I haven't tested if ID is allowed in the context of a column name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stumbled into the ID problem when I activated JOIN ... USING - suddenly the tests failed for DB2i because ID was converted to uppercase by the formatter. So I changed the id in the test to a more generic column name.

Just tested id ad a column name on DB2 for i - and it is indeed allowed... Is it better to remove ID from the reserved words list?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please test this?

I remember seeing ID in the keywords list of some other SQL dialects, but it has always turned out that it's not really a reserved keyword.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Best to just remove it from reserved words list. Given that this list has currently like 500 words, my suspicion is that many more of these keywords are actually not reserved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered a similar problem when trying to find out which keywords does SingleStoreDB actually reserve. Their documentation only listed all the keywords and didn't distinguish which ones are reserved and which ones were not. I ended up executing hundreds of SQL queries to see which would actually result in a failure.

@chrjorgensen
Copy link
Contributor Author

Thanks for looking into the DB2 "mess"! 😃

I've made some updates to the DB2i formatter - and it raised some questions in my head:

  • I had to remove ANY from the functions, since it collided with the ANY set operator. Any (no pun intended) way to distinguish between the two?
  • Where are the rules for identifier name specified? Is it the identspecifications in <dialect>.formatter.ts?
  • What is the priority used by the formatter? Order of specifications?

I'm curious and want to learn how the formatter works...

@chrjorgensen chrjorgensen requested a review from nene November 9, 2023 13:52
@nene
Copy link
Collaborator

nene commented Nov 9, 2023

I had to remove ANY from the functions, since it collided with the ANY set operator. Any (no pun intended) way to distinguish between the two?

Not really. I guess in both cases ANY will be followed by (, which makes distinguishing the two extra difficult. Also related to #444

Where are the rules for identifier name specified? Is it the identspecifications in <dialect>.formatter.ts?

Yes, that's correct. The identChars option defines additional characters for identifier names. The regex for matching identifiers is constructed here: https://github.com/sql-formatter-org/sql-formatter/blob/master/src/lexer/regexFactory.ts#L162C22-L162C22

What is the priority used by the formatter? Order of specifications?

The Tokenizer class defines the order of matching different types of tokens. Pretty much reading the rules listed in this class from top-to-bottom... with higher priority items at the top. From there one can see that reservedClauses is matched before reservedSelect.

But really the main thing is not to repeat the same pattern in multiple places. One notable special case is reservedPhrases which sole purpose is to have higher priority than reservedClauses, which allows overriding patterns that would otherwise be detected and formatted as separate clauses, like overriding ON UPDATE so it's not confused with the UPDATE statement.

nene added a commit that referenced this pull request Nov 9, 2023
@nene nene merged commit dd91db1 into sql-formatter-org:master Nov 9, 2023
2 checks passed
@nene
Copy link
Collaborator

nene commented Nov 9, 2023

I made small tweaks and merged this in.

@nene
Copy link
Collaborator

nene commented Nov 10, 2023

This is now released in 13.1.0.

@chrjorgensen
Copy link
Contributor Author

Thank you! And thank you for explaining about the formatter and how it works. 😃

@chrjorgensen chrjorgensen deleted the feature/add-db2-for-i-dialect branch November 10, 2023 13:52
@worksofliam
Copy link

Thanks @nene and @chrjorgensen for your work here. I am grateful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants