Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1511601: [local testing] Snowpark 1.18.0 broke inserts using current_timestamp() in timestamp columns #1856

Closed
mraraujo opened this issue Jun 28, 2024 · 9 comments
Assignees
Labels
bug Something isn't working local testing Local Testing issues/PRs status-triage_done Initial triage done, will be further handled by the driver team

Comments

@mraraujo
Copy link

Code such as the following works in 1.17.0 but throws an error in 1.18.0:

        merge_result = df.merge(
            source,
            cast(Column, (df['id'] == source['id'])),
            [when_not_matched().insert({
                'id': source['id'],
                'last_update': current_timestamp()
            })]
        )

The error is

    def write_table(
        self, name: Union[str, Iterable[str]], table: TableEmulator, mode: SaveMode
    ) -> Row:
        for column in table.columns:
            if not table[column].sf_type.nullable and table[column].isnull().any():
>               raise SnowparkLocalTestingException(
                    "NULL result in a non-nullable column"
                )
E               snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException: NULL result in a non-nullable column
@mraraujo mraraujo added bug Something isn't working needs triage Initial RCA is required labels Jun 28, 2024
@github-actions github-actions bot changed the title [local testing] Snowpark 1.18.0 broke inserts using current_timestamp() in timestamp columns SNOW-1511601: [local testing] Snowpark 1.18.0 broke inserts using current_timestamp() in timestamp columns Jun 28, 2024
@mraraujo
Copy link
Author

mraraujo commented Jun 28, 2024

Note that replacing current_timestamp() with datetime.today() succeeds, but the behavior is not equivalent outside of local testing as current_timestamp() should run in the warehouse.

@sfc-gh-sghosh
Copy link

Hello @mraraujo ,

Thanks for raising the issue, we are looking into it, will update.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage Issue is under initial triage and removed needs triage Initial RCA is required bug Something isn't working labels Jul 1, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Jul 1, 2024
@sfc-gh-sghosh
Copy link

Hello @mraraujo ,

Could you share the code snippet so we can reproduce the issue.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added the status-information_needed Additional information is required from the reporter label Jul 2, 2024
@mraraujo
Copy link
Author

mraraujo commented Jul 2, 2024

A small reproducible example:

from snowflake.snowpark import Session, Column
from snowflake.snowpark.functions import current_timestamp, when_not_matched
from snowflake.snowpark.types import StructType, StructField, StringType, TimestampType, TimestampTimeZone
import datetime as dt
from typing import cast

session = Session.builder.config('local_testing', True).create()

target_data = [
    ("id1", dt.datetime(year=2024, month=3, day=1)),
]
target_schema = StructType([
    StructField("id", StringType()),
    StructField("last_update", TimestampType(TimestampTimeZone.NTZ)),
])

source_data = [
    ("id1"),
    ("id2"),
]

target = session.create_dataframe(target_data, schema=target_schema)
target.write.save_as_table('target', mode='overwrite')
target = session.table('target')

source = session.create_dataframe(source_data, schema=StructType([StructField("id", StringType())]))

merge_result = target.merge(
    source,
    cast(Column, (target['id'] == source['id'])),
    [when_not_matched().insert({
        'id': source['id'],
        'last_update': current_timestamp()
    })]
)

@mraraujo
Copy link
Author

mraraujo commented Jul 2, 2024

Note that the above snippet runs using snowpark 1.17.0 buts throws an exception if the version is >=1.18.0 .

@sfc-gh-sghosh
Copy link

Hello @mraraujo ,

Thank you for code snippet, we are able to reproduce the issue and confirms the issue is with local_testing, its working fine with standalone session.

The output as below with standalone session: Merge is happening properly
`------------------------------
|"ID" |"LAST_UPDATE" |

|id1 |2024-03-01 00:00:00 |


|"ID" |

|id1 |
|id2 |


|"ID" |"LAST_UPDATE" |

|id1 |2024-03-01 00:00:00 |
|id2 |2024-07-08 02:36:51.947000 |`

Where as with local_testing, the following error while merging 'SnowparkLocalTestingException: NULL result in a non-nullable column'

We will work on eliminating it.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added bug Something isn't working local testing Local Testing issues/PRs and removed status-information_needed Additional information is required from the reporter status-triage Issue is under initial triage labels Jul 8, 2024
@sfc-gh-sghosh sfc-gh-sghosh added the status-triage_done Initial triage done, will be further handled by the driver team label Jul 8, 2024
@sfc-gh-aling
Copy link
Contributor

thanks for the report. confirmed this is a bug in our insertion during the merge operation.
I have a PR out to fix the issue: #1949

@sfc-gh-aling
Copy link
Contributor

I have merged the PR, it will be carried in our next release.
if you'd like to try it out now, you could install form the git main branch.

@sfc-gh-aling
Copy link
Contributor

I'm closing the issue now, FYI we are about to do the release late this/early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working local testing Local Testing issues/PRs status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

4 participants