Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake column name case-sensitive #148

Open
0xbe7a opened this issue May 30, 2023 · 3 comments
Open

Snowflake column name case-sensitive #148

0xbe7a opened this issue May 30, 2023 · 3 comments

Comments

@0xbe7a
Copy link

0xbe7a commented May 30, 2023

I am trying to specify constraints for a Snowflake table with a schema that includes a column named "ALTER". After adding an add_uniques_equality_constraint constraint, testing the code results in a KeyError: 'alter' error.

Steps to Reproduce

  1. Create a Snowflake table with the following schema:
create or replace TABLE EXAMPLE (
	JAHR FLOAT,
	"ALTER" FLOAT
);
  1. Add a add_uniques_equality_constraint constraint:
requirement.add_uniques_equality_constraint(
    ["ALTER"], list(range(0, 100 + 1))
)
  1. Test the constraints

Expected Behavior

The code should run without errors.

Actual Behavior

Running the code results in a KeyError: 'alter' error.

Root Cause

The issue is caused by the Snowflake capitalization fix (https://github.com/Quantco/datajudge/blob/main/src/datajudge/db_access.py#L346). This fix forces the column name to be lowercase, which interferes with the case-sensitive "ALTER" column name in the table schema.

@0xbe7a 0xbe7a changed the title Snowflake column name casing Snowflake column name case-sensitive May 30, 2023
@kklein
Copy link
Collaborator

kklein commented May 31, 2023

Hi @0xbe7a - thanks for your careful write-up.
Do I guess correctly that this problem is unlikely to be linked to the fact that ALTER is a SQL keyword per se, but rather due to the fact that it is wrapped with quotes? Put differently, does the problem occur, too, if you wrap, e.g. JAHR with quotes?

create or replace TABLE EXAMPLE (
	"JAHR" FLOAT,
	age FLOAT
);

@0xbe7a
Copy link
Author

0xbe7a commented May 31, 2023

Yes, you are correct, this issue stems from the fact that "ALTER" is quoted. The quotation is required because ALTER is a SQL keyword, but "JAHR" will result in the same behaviour. This quotation is problematic here because datajudge converts all column names to lowercase. This breaks here since Double-quoted Identifiers are case-sensitive on Snowflake (https://docs.snowflake.com/en/sql-reference/identifiers-syntax). This results in the KeyError since the casing is messed up

@kklein
Copy link
Collaborator

kklein commented May 31, 2023

I see. For context, we came up with this work-around because the out-of-the box behaviour of snowflake-sqlalchemy was not what we expected it to be, see here: snowflakedb/snowflake-sqlalchemy#157

Since we are not the only people who are unhappy with the current default behaviour, others seemed to have written their internal work-arounds, too. In particular ibis seems to have implemented a client-side fix taking quoted identifiers into account:
ibis-project/ibis#5741

Unfortunately I haven't had the chance to look into the latter in greater detail. If this is something that you find relevant interesting, it'd certainly be useful if you could have a look at it in order to assess whether we could follow a similar approach with datajudge. No pressure, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants