Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1232488 Column names are converted to lowercase #157

Open
RoyalTS opened this issue Feb 19, 2020 · 12 comments
Open

SNOW-1232488 Column names are converted to lowercase #157

RoyalTS opened this issue Feb 19, 2020 · 12 comments
Assignees
Labels
bug Something isn't working status-triage_done Initial triage done, will be further handled by the driver team

Comments

@RoyalTS
Copy link

RoyalTS commented Feb 19, 2020

All column names, irrespective of their casing in the Snowflake database, seem to be converted to lower-case.

  1. What version of Python are you using (python --version)?

3.7.3

  1. What operating system and processor architecture are you using (python -c 'import platform; print(platform.platform())')?

Linux-4.15.0-1060-aws-x86_64-with-debian-buster-sid

  1. What are the component versions in the environment (pip list)?
pandas                          0.25.3
snowflake-connector-python       2.1.3
snowflake-sqlalchemy             1.2.0
  1. What did you do?
import sqlalchemy as sa
from snowflake.sqlalchemy import URL

engine = db.create_engine(URL(...))

res = engine.execute('''
SELECT
NULL::TEXT AS COL1,
NULL::TEXT AS "col2",
NULL::TEXT AS "COL3"
''')
row = res.fetchone()
row.keys()
  1. What did you expect to see?
['COL', 'col2', 'COL3']

(which is the column names I get when I grab the query result using the Snowflake Connector's fetch_pandas_all().

  1. What did you see instead?
['col1', 'col2', 'col3']
@keller00 keller00 self-assigned this Feb 19, 2020
@RoyalTS
Copy link
Author

RoyalTS commented Mar 10, 2020

@keller00 I see you self-assigned this. Does that mean you're working on this? If so, got a rough timeline?

@keller00
Copy link
Contributor

@keller00 I see you self-assigned this. Does that mean you're working on this? If so, got a rough timeline?

I will when I find some time, I'm just trying to let you know that I have looked at it. But if yo need this in a short time frame, then I'd encourage you to open a PR, this helps us out immensely even if we end up fixing bugs in the PR submitted.

This whole issue comes from the fact that if you single quote in Snowflake then it is case sensitive, while not quoting something makes the string upper case by default.
This would change behaviour of the connector and would need to be announced ahead of time.

@patrickmcdougle-okta
Copy link

This causes issues with tools like Apache Superset that use introspection and column name matching to have shared filters across multiple visualizations. Sometimes the columns are full uppercase and sometimes they aren't.

@rareal
Copy link

rareal commented Jul 10, 2021

Hi, I'm sorry for the annoying question, but are there any updates on this one?

@patrickhowerter
Copy link

Wouldn't it make more sense for it to be uppercase? Snowflake defaults to returning column names as uppercase.

@struck89
Copy link

struck89 commented Sep 6, 2021

Same issue here, but if you mix upper case and lower case it obeys:
Try:
SELECT NULL::TEXT AS COL1, NULL::TEXT AS "col2", NULL::TEXT AS "COL3", NULL::TEXT AS "Col4"

Edit: Found this: #73

It is my understanding that this can be closed, as the correct cases can be found from the cursor description.

Continuing with the code given above:
res.cursor.description

[('COL1', 2, None, 16777216, None, None, True),
 ('col2', 2, None, 16777216, None, None, True),
 ('COL3', 2, None, 16777216, None, None, True),
 ('Col4', 2, None, 16777216, None, None, True)]

Instead of using pandas (pd.read_sql) I just updated my code to run the following:

res = snow.execute(sql_stmt)
columns = [line[0] for line in res.cursor.description]
df = pd.DataFrame(res, columns = columns)

@patrickhowerter
Copy link

@keller00 @sfc-gh-abhatnagar Do you think you can assign this to someone else that works on the project? It's a pretty difficult bug to work around.

@fflakito
Copy link

+1!
I'd be very much interested in seeing a fix for this.

@cpcloud
Copy link

cpcloud commented Mar 15, 2023

Since this behavior is fundamentally broken, we had to work around this in ibis in ibis-project/ibis#5741.

If you use ibis, we do the minimum amount of munging need to make things work and we preserve the quoting behavior of snowflake instead of the broken behavior in snowflake-sqlalchemy.

@seb-kro
Copy link

seb-kro commented Apr 24, 2023

We are also maintaining internal workarounds for the broken case handling. Would be very helpful to have this fixed inside snowflake-sqlalchemy itself, especially since it is such a fundamental issue that is already known for years and should be rather straightforward to fix. It even seems that snowflake.sqlalchemy.snowdialect.SnowflakeDialect already has a reasonable implementation of normalize_name and denormalize_name. But especially denormalize_name just doesn't seem to be applied when needed.

This issue even extends to snowflake.connector, where e.g. some table names are simply converted to uppercase (connector.pandas_tools.pd_writer). So you need to maintain a workaround there as well if you want to support mixed case handling.

write_pandas(
        conn=sf_connection,
        df=df,
        # Note: Our sqlalchemy connector creates tables case insensitively
        table_name=table.name.upper(),
        schema=table.schema,
        **kwargs,
)

@sfc-gh-dszmolka
Copy link
Contributor

hi folks - apologies for leaving this unanswered for so long; we're changing that going forward.
for now, possibly

might be all originating from the same gap in snowflake-sqlalchemy. At this time, I cannot promise any timeline for taking care of this, but rest assured we're aware of the issue and i'll keep this thread posted.

@sfc-gh-dszmolka sfc-gh-dszmolka added bug Something isn't working status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage labels Mar 13, 2024
@Catmktg-Bob-Jia
Copy link

in case anyone else getting the same confusion, it looks like

  • if column name has "_" as prefix, Pandas DataFrame column name will be all-upper-case, regardless of the Snowflake SQL
  • if column name does NOT have "_" as prefix, Pandas DataFrame column name will be all-lower-case, regardless of the Snowflake SQL

.read_sql( 'SELECT 1 AS col' )['col'] --> pass
.read_sql( 'SELECT 1 AS col' )['COL'] --> fail
.read_sql( 'SELECT 1 AS COL' )['col'] --> pass
.read_sql( 'SELECT 1 AS COL' )['COL'] --> fail

.read_sql( 'SELECT 1 AS _col' )['_col'] --> fail
.read_sql( 'SELECT 1 AS _col' )['_COL'] --> pass
.read_sql( 'SELECT 1 AS _COL' )['_col'] --> fail
.read_sql( 'SELECT 1 AS _COL' )['_COL'] --> pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests