Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1368516: Creating a dataframe from pandas dataframe with datetime objects fail to set the right column type #1522

Open
sfc-gh-stan opened this issue May 6, 2024 · 1 comment
Assignees
Labels
bug Something isn't working local testing Local Testing issues/PRs status-triage_done Initial triage done, will be further handled by the driver team

Comments

@sfc-gh-stan
Copy link
Collaborator

data = {
        "pandas_datetime": ["2021-09-30 12:00:00", "2021-09-30 13:00:00"],
        "date": [pd.to_datetime("2010-1-1"), pd.to_datetime("2011-1-1")],
        "datetime.datetime": [
            datetime.datetime(2010, 1, 1),
            datetime.datetime(2010, 1, 1),
        ],
    }
    pdf = pd.DataFrame(data)
    pdf["pandas_datetime"] = pd.to_datetime(pdf["pandas_datetime"])
    df = session.create_dataframe(pdf)
   print(df.schema)

prints

StructType([StructField('"pandas_datetime"', LongType(), nullable=True), StructField('"date"', LongType(), nullable=True), StructField('"datetime.datetime"', LongType(), nullable=True)])

which can be traced to utility function src/snowflake/snowpark/mock/_pandas_util.py::_extract_schema_and_data_from_pandas_df extracting the schema wrong. This is a bug that blocks the test tests/integ/test_dataframe.py::test_create_dataframe_with_pandas_df from being enabled to run against Local Testing.

@sfc-gh-stan sfc-gh-stan added bug Something isn't working needs triage Initial RCA is required local testing Local Testing issues/PRs labels May 6, 2024
@github-actions github-actions bot changed the title Creating a dataframe from pandas dataframe with datetime objects fail to set the right column type SNOW-1368516: Creating a dataframe from pandas dataframe with datetime objects fail to set the right column type May 6, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this May 7, 2024
@sfc-gh-sghosh
Copy link

sfc-gh-sghosh commented May 7, 2024

Hello @sfc-gh-stan ,

Thanks for raising the issue.

Yes, the schema information is incorrect with local testing compare to default session.

default session:
StructType([StructField('"pandas_datetime"', TimestampType(tz=ntz), nullable=True), StructField('"date"', TimestampType(tz=ntz), nullable=True), StructField('"datetime.datetime"', TimestampType(tz=ntz), nullable=True)])

local_testing:
StructType([StructField('"pandas_datetime"', LongType(), nullable=True), StructField('"date"', LongType(), nullable=True), StructField('"datetime.datetime"', LongType(), nullable=True)])

We will work on eliminating it and update.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed needs triage Initial RCA is required labels May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working local testing Local Testing issues/PRs status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

3 participants