Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1269037: NaT (Not a Time) not parsed as NULL when session.create_dataframe (throws exception instead) #1331

Closed
cvmartin opened this issue Mar 25, 2024 · 8 comments
Assignees
Labels
local testing Local Testing issues/PRs status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. status-triage_done Initial triage done, will be further handled by the driver team

Comments

@cvmartin
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    3.11.7

  2. What are the Snowpark Python and pandas versions in the environment?

    pandas==1.5.3
    snowflake-snowpark-python==1.12.0

  3. What did you do?

The following code works as expected:

from snowflake.snowpark.types import StructType, StructField, TimestampType

from snowflake.snowpark import Session
import pandas as pd

session = Session.builder.config('local_testing', True).create()

df = pd.DataFrame(
    {
        "date": pd.to_datetime(
            ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
            utc=True
        )
    }
)

sf_df = session.create_dataframe(
    data = df,
    schema = StructType([StructField("date", TimestampType())])
)
sf_df.schema

I get StructType([StructField('"date"', TimestampType(), nullable=True)]). Specifying the schema is not even necessary. What is important is specifying utc=True; otherwise the column is coerced to a LongType().

Now, if one of the values of the date vector is coerced into a NaT value (not a time):

df = pd.DataFrame(
    {
        "date": pd.to_datetime(
            [None, "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
            utc=True
        )
    }
)

sf_df = session.create_dataframe(
    data = df,
    schema = StructType([StructField("date", TimestampType())])
)

The snippet above throws a very clear exception, indicating that NaTs are not supported:

TypeError: not supported type: <class 'pandas._libs.tslibs.nattype.NaTType'>

  1. What did you expect to see?

This behavior does not happen out of the Local Testing framework. Using a "real snowflake session", the dataframe is properly created, parsing NaT as NULL:

-----------------------------
|"date"                     |
-----------------------------
|NULL                       |
|2020-01-12 16:00:00-08:00  |
|2020-01-31 16:00:00-08:00  |
|2020-02-22 16:00:00-08:00  |
|2020-03-04 16:00:00-08:00  |
-----------------------------

Also, NaN values work alright inside the Local Testing framework. In other words, if the snipped above would have used a numeric column, it would have worked fine. It is only with time values (NaT) when the problem arises.

@cvmartin cvmartin added bug Something isn't working local testing Local Testing issues/PRs needs triage Initial RCA is required labels Mar 25, 2024
@github-actions github-actions bot changed the title NaT (Not a Time) not parsed as NULL when session.create_dataframe (throws exception instead) SNOW-1269037: NaT (Not a Time) not parsed as NULL when session.create_dataframe (throws exception instead) Mar 25, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Mar 25, 2024
@sfc-gh-sghosh sfc-gh-sghosh added status-triage Issue is under initial triage and removed bug Something isn't working needs triage Initial RCA is required labels Mar 25, 2024
@sfc-gh-sghosh
Copy link

Hello @cvmartin ,

Thanks for raising the issue, we are looking into it.

Regards,
Sujan

@sfc-gh-sghosh
Copy link

sfc-gh-sghosh commented Mar 27, 2024

Hello @cvmartin ,

I tried the above sample code snippet with snowpatk 1.14.0 and its not throwing any error, following is the output

code:
`
from snowflake.snowpark.types import StructType, StructField, TimestampType
from snowflake.snowpark import Session
import pandas as pd

df = pd.DataFrame(
{
"date": pd.to_datetime(
[None, "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
utc=True
)
}
)

sf_df = session.create_dataframe(
data=df,
schema=StructType([StructField("date", TimestampType())])
)

sf_df.schema
`

Output:
StructType([StructField('"date"', TimestampType(tz=ltz), nullable=True)])

For both case whether I use None or valid date value, the output is same as above

Regards,
Sujan

@cvmartin
Copy link
Author

cvmartin commented Apr 5, 2024

Hello @sfc-gh-sghosh.

Thanks for your reply. The code that you provide I believe it does not make use of the Local Testing Framework, which is where I find the problem.

Specifically, I miss the line

session = Session.builder.config('local_testing', True).create()

That indicates to create a Local Testing Framework connection named session (else, NameError: name 'session' is not defined)

With the complete code, that looks like

from snowflake.snowpark.types import StructType, StructField, TimestampType
from snowflake.snowpark import Session
import pandas as pd

session = Session.builder.config('local_testing', True).create()


df = pd.DataFrame(
{
"date": pd.to_datetime(
[None, "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
utc=True
)
}
)

sf_df = session.create_dataframe(
data=df,
schema=StructType([StructField("date", TimestampType())])
)

sf_df.schema

I can reproduce the error, also with snowflake-snowpark-python==1.14.0

@sfc-gh-sghosh
Copy link

Thank you @cvmartin ,

We are able to reproduce the issue, we are looking into it, will update.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage labels Apr 14, 2024
@sfc-gh-sghosh
Copy link

Hello @cvmartin ,

The team is working on the fix via #1393

Regards,
Sujan

@cvmartin
Copy link
Author

Thanks a lot!

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-pr_pending_merge A PR is made and is under review label Apr 29, 2024
@sfc-gh-dszmolka
Copy link

PR is merged and will be part of the next release

@sfc-gh-dszmolka sfc-gh-dszmolka added status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. and removed status-pr_pending_merge A PR is made and is under review labels May 2, 2024
@sfc-gh-aling
Copy link
Contributor

I'm closing the issue as the PR has been merged the released. please feel free to reach out if you still see issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
local testing Local Testing issues/PRs status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

5 participants