SNOW-219884: Pandas datetime with timezone converts to `timestamp_ntz` in snowflake #199

kychanbp · 2020-11-09T04:21:27Z

Please answer these questions before submitting your issue. Thanks!

What version of Python are you using (python --version)?
Python 3.8.2
What operating system and processor architecture are you using (python -c 'import platform; print(platform.platform())')?
macOS-10.15.7-x86_64-i386-64bit
What are the component versions in the environment (pip list)?
Package Version

asn1crypto 1.4.0
awswrangler 1.10.0
azure-common 1.1.25
azure-core 1.8.2
azure-storage-blob 12.5.0
boto3 1.15.18
botocore 1.18.18
certifi 2020.6.20
cffi 1.14.3
chardet 3.0.4
cryptography 3.2.1
idna 2.10
isodate 0.6.0
jmespath 0.10.0
msrest 0.6.19
numpy 1.19.4
oauthlib 3.1.0
oscrypto 1.2.1
packaging 20.4
pandas 1.1.4
pip 20.2.1
psycopg2-binary 2.8.6
pyarrow 2.0.0
pycparser 2.20
pycryptodomex 3.9.9
PyJWT 1.7.1
PyMySQL 0.10.1
pyOpenSSL 19.1.0
pyparsing 2.4.7
python-dateutil 2.8.1
pytz 2020.4
requests 2.23.0
requests-oauthlib 1.3.0
s3transfer 0.3.3
setuptools 50.2.0
six 1.15.0
snowflake-connector-python 2.3.5
snowflake-sqlalchemy 1.2.4
SQLAlchemy 1.3.20
sqlalchemy-redshift 0.8.1
urllib3 1.25.11
wheel 0.35.1

What did you do?
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.

I extract data from MySQL and constructed a pandas data frame. For example,

                     transaction_date                       created_on                      modified_on
0            2017-10-04 08:00:00+00:00        2018-10-05 18:56:39+08:00        2018-10-05 18:56:39+08:00
1            2017-10-04 08:00:00+00:00        2018-10-05 18:56:39+08:00        2018-10-05 18:56:39+08:00
2            2017-10-04 08:00:00+00:00        2018-10-05 18:56:39+08:00        2018-10-05 18:56:39+08:00
3            2017-10-04 08:00:00+00:00        2018-10-05 18:56:39+08:00        2018-10-05 18:56:39+08:00
4            2017-10-04 08:00:00+00:00        2018-10-05 18:56:39+08:00        2018-10-05 18:56:39+08:00
...                                ...                              ...                              ...
33934        2020-11-05 00:00:00+00:00 2020-11-06 18:45:03.681785+08:00 2020-11-06 18:45:03.711735+08:00
33935        2020-11-05 00:00:00+00:00 2020-11-06 18:45:03.685278+08:00 2020-11-06 18:45:18.441977+08:00
33936        2020-11-05 00:00:00+00:00 2020-11-06 18:45:03.688080+08:00 2020-11-06 18:45:21.579158+08:00
33937 2020-11-09 10:00:13.190445+00:00 2020-11-09 10:00:13.195200+08:00 2020-11-09 10:00:13.195237+08:00
33938        2020-11-09 10:28:57+00:00 2020-11-09 10:30:02.515862+08:00 2020-11-09 10:30:02.515884+08:00

the column types are:

transaction_date               datetime64[ns, UTC]
created_on          datetime64[ns, Asia/Hong_Kong]
modified_on         datetime64[ns, Asia/Hong_Kong]

I used to following code to load data into snowflake:
data.to_sql(target_table, conn, if_exists='replace',index=False, method=pd_writer)

What did you expect to see?
TIMESTAMP_TZ columns
What did you see instead?
TIMESTAMP_NTZ(9)
Can you set logging to DEBUG and collect the logs?

import logging
import os

for logger_name in ['snowflake.sqlalchemy', 'snowflake.connector', 'botocore']: 
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)

The text was updated successfully, but these errors were encountered:

kychanbp · 2020-11-11T03:02:46Z

Any possibilities I got this error?

kychanbp · 2020-11-11T04:44:12Z

I tried to inspect the log file of alchemy. The sql statement is

CREATE TABLE x_ap_account_transactions (
	"ID" BIGINT, 
	"OBJECT_PK" BIGINT, 
	"OBJECT_REF" TEXT, 
	"TRANSACTION_DESC" TEXT, 
	"TRANSACTION_TYPE" TEXT, 
	"TRANSACTION_DATE" TIMESTAMP, 
	"ORIGINAL_FEE" FLOAT, 
	"ORIGINAL_CURRENCY" TEXT, 
	"EFFECTIVE_FEE" FLOAT, 
	"EFFECTIVE_CURRENCY" TEXT, 
	"IS_SUSPENSION_TRANSACTION" BIGINT, 
	"CREATED_ON" TIMESTAMP, 
	"MODIFIED_ON" TIMESTAMP, 
	"STATUS" TEXT, 
	"ACCOUNT_ID" BIGINT
)

It seems that the statement only creates `TIMESTAMP`. But not timestamp with timezone information?

kychanbp · 2020-11-11T04:45:52Z

Or should I set the timestamp mapping specify in this link: https://docs.snowflake.com/en/sql-reference/parameters.html#timestamp-type-mapping?

kychanbp · 2020-11-11T07:35:56Z

I think I solve it by

conn.execute("ALTER SESSION SET TIMESTAMP_TYPE_MAPPING = 'TIMESTAMP_TZ';")
conn.execute("ALTER SESSION SET TIMEZONE = 'UTC';")

setting the timestamp type mapping and timezone.

lightupyiqian · 2020-11-12T17:51:49Z

Saw the same problem using pandato_sql. Looks like a bug as doc states that

"Timezone aware datetime columns will be written as Timestamp with timezone type with SQLAlchemy if supported by the database. Otherwise, the datetimes will be stored as timezone unaware timestamps local to the original timezone.
" https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

kychanbp · 2020-11-13T07:12:07Z

@lightupyiqian But it seem snowflake supports Timestamp with timezone? I expect pd_writer can handle that?

lightupyiqian · 2020-11-15T06:35:24Z

@lightupyiqian But it seem snowflake supports Timestamp with timezone? I expect pd_writer can handle that?

yeah, it should, but has other problem like not quoting identifier properly.

Nazarii · 2020-11-19T14:13:43Z

This bug was introduced in v1.1.18 and is also touches sqlalchemy.DateTime(timezone=True) columns - in that case, the column is being passed as DATETIME to the snowflake, and DATETIME is an alias to TIMESTAMP_NTZ in snowflake. Previous v1.1.17 logic created TIMESTAMP_TZ column.
v1.1.17...v1.1.18

toandm · 2022-09-08T08:57:23Z

Hi guys, this issue still persists with snowflake-connector-python==2.7.11. My dataframe column datatype is datetime64[ns, UTC], but when I use df.to_sql to write to Snowflake, result datatype is TIMESTAMP_NTZ. Any ideas how to fix this?

ChuliangXiao · 2022-11-02T18:09:58Z

Having the same issue Pandas datetime64[ns, America/New_York] becomes Snowflake TIMESTAMP_NTZ(9) using snowpark session.write_pandas

El-Carverino · 2022-12-10T03:43:00Z

I am also encountering the same problem.

I have a timezone-aware Python datetime object: start_date = datetime.now(tz=timezone.utc).
I have a Snowflake procedure, which expects a TIMESTAMP_TZ argument.
I am calling that Snowflake procedure, using Snowpark session.call, passing the above start_date variable as the argument.
Contrary to the documentation, the Snowflake data type becomes TIMESTAMP_NTZ, and when handled by the procedure, it "converts" the value (back) to TIMESTAMP_TZ, to match the expected type, which incorrectly alters the timestamp value.

ed-openstore · 2022-12-16T19:25:23Z

Has anyone found a true fix for this? Still persists using sqlalchemy and pdwriter

sabrinazhengliu · 2023-01-05T21:41:01Z

I'm having the same issue using sqlalchemy engine and df.to_sql(). The source data is pandas timestamp with local time zone information. After writing to Snowflake it becomes TIMESTAMP_NTZ type. Anyone working on this issue?

Meanwhile, I was able to get the desired data types by explicitly creating the table with column definitions before ingesting data.

jaycgibbs · 2023-03-08T18:07:15Z

Also having this issue -- pandas dataframe with correct time zone offsets is converted to NTZ when sent to snowflake. I'm using sqlalchemy engine and df.to_sql().

JayP718 · 2023-10-19T15:36:46Z

bump

sfc-gh-dszmolka · 2024-03-20T08:22:49Z

hi and apologies for taking so long to look into this, we're going to change this going forward. checking this issue and thank you Nazarii for your reproduction on the linked issue!

sfc-gh-dszmolka · 2024-03-20T12:28:09Z

unfortunately it looks like this long regression is still there :( and switching to sqlalchemy.TIMESTAMP does not help either; the query is still generated as updated_at TIMESTAMP which is the same as the original updated_at datetime ; both are mapped to TIMESTAMP_NTZ Snowflake datatype by default.

we're going to take a look and fix this.

until then, the workarounds already suggested in this issue are available to use:

setting TIMESTAMP_TYPE_MAPPING to e.g. TIMESTAMP_TZ prior to creating the table with the timestamp column
creating the table with explicit TIMESTAMP_TZ column before ingesting data into it

thank you for bearing with us !

Nazarii · 2024-03-20T13:28:31Z

@sfc-gh-dszmolka Thanks, btw I have a PR that fixes this issue that was opened for a long time, now there are merge conflicts that should be fixed,

sfc-gh-dszmolka · 2024-03-20T14:14:25Z

thanks @Nazarii for the contribution! added some reviewers from the connector team in hope that the review process can be sped up

however as you mentioned, the conflicts still need to be fixed eventually; do you think that would be something possible to do ? or even submit a new PR based on the current main? there's been probably tons of changes since the PR was submitted originally

github-actions bot changed the title ~~Pandas datetime with timezone converts to timestamp_ntz in snowflake~~ SNOW-219884: Pandas datetime with timezone converts to timestamp_ntz in snowflake Nov 9, 2020

Nazarii mentioned this issue Nov 19, 2020

SNOW-227101: Datatime columns with timezone are created as TIMEZONE_NTZ #201

Closed

jamiedemaria mentioned this issue Mar 22, 2023

[dagster-snowflake-pandas] pandas timestamp conversion fix dagster-io/dagster#12190

Closed

sfc-gh-dszmolka self-assigned this Mar 20, 2024

sfc-gh-dszmolka added the status-triage Issue is under initial triage label Mar 20, 2024

sfc-gh-dszmolka added bug Something isn't working status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage labels Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-219884: Pandas datetime with timezone converts to `timestamp_ntz` in snowflake #199

SNOW-219884: Pandas datetime with timezone converts to `timestamp_ntz` in snowflake #199

kychanbp commented Nov 9, 2020

kychanbp commented Nov 11, 2020

kychanbp commented Nov 11, 2020

kychanbp commented Nov 11, 2020

kychanbp commented Nov 11, 2020

lightupyiqian commented Nov 12, 2020

kychanbp commented Nov 13, 2020

lightupyiqian commented Nov 15, 2020

Nazarii commented Nov 19, 2020 •

edited

Loading

toandm commented Sep 8, 2022

ChuliangXiao commented Nov 2, 2022

El-Carverino commented Dec 10, 2022 •

edited

Loading

ed-openstore commented Dec 16, 2022

sabrinazhengliu commented Jan 5, 2023 •

edited

Loading

jaycgibbs commented Mar 8, 2023

JayP718 commented Oct 19, 2023

sfc-gh-dszmolka commented Mar 20, 2024

sfc-gh-dszmolka commented Mar 20, 2024

Nazarii commented Mar 20, 2024

sfc-gh-dszmolka commented Mar 20, 2024

SNOW-219884: Pandas datetime with timezone converts to timestamp_ntz in snowflake #199

SNOW-219884: Pandas datetime with timezone converts to timestamp_ntz in snowflake #199

Comments

kychanbp commented Nov 9, 2020

kychanbp commented Nov 11, 2020

kychanbp commented Nov 11, 2020

kychanbp commented Nov 11, 2020

kychanbp commented Nov 11, 2020

lightupyiqian commented Nov 12, 2020

kychanbp commented Nov 13, 2020

lightupyiqian commented Nov 15, 2020

Nazarii commented Nov 19, 2020 • edited Loading

toandm commented Sep 8, 2022

ChuliangXiao commented Nov 2, 2022

El-Carverino commented Dec 10, 2022 • edited Loading

ed-openstore commented Dec 16, 2022

sabrinazhengliu commented Jan 5, 2023 • edited Loading

jaycgibbs commented Mar 8, 2023

JayP718 commented Oct 19, 2023

sfc-gh-dszmolka commented Mar 20, 2024

sfc-gh-dszmolka commented Mar 20, 2024

Nazarii commented Mar 20, 2024

sfc-gh-dszmolka commented Mar 20, 2024

SNOW-219884: Pandas datetime with timezone converts to `timestamp_ntz` in snowflake #199

SNOW-219884: Pandas datetime with timezone converts to `timestamp_ntz` in snowflake #199

Nazarii commented Nov 19, 2020 •

edited

Loading

El-Carverino commented Dec 10, 2022 •

edited

Loading

sabrinazhengliu commented Jan 5, 2023 •

edited

Loading