Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1513884: Local Testing Issues with dateadd and current_timestamp Functions in Snowpark Python #1859

Closed
hima-gopisetty opened this issue Jul 1, 2024 · 4 comments
Assignees
Labels
local testing Local Testing issues/PRs status-triage_done Initial triage done, will be further handled by the driver team

Comments

@hima-gopisetty
Copy link

Please answer these questions before submitting your issue. Thanks!

1. What version of Python are you using?

   Python 3.11.8 

2. What are the Snowpark Python and pandas versions in the environment?

  pandas==1.5.3
  snowflake-snowpark-python==1.14.0

3. What did you do?

I was using the Snowflake Snowpark Python library to work with DataFrames, specifically adding current timestamps and manipulating date values. Below is the complete, executable code that led to the issues

from snowflake.snowpark.functions import dateadd, lit, current_timestamp
from snowflake.snowpark.types import StringType, StructField, StructType, TimestampType
from snowflake.snowpark import Session

# Create a session - ensure you set up your own configuration appropriately
session = Session.builder.config('local_testing', True).create()

# Create a Snowflake DataFrame
sf_df = session.create_dataframe(
    data = [
        ["RL_TEST_ROLE_GRANTS", "ABC123", "2000-01-01 00:00:00"],
        ["RL_TEST_ROLE_SUPPORT", "ABC123", "2000-01-01 00:00:00"]
    ],
    schema=StructType([
        StructField("ROLE_NAME", StringType()),
        StructField("GIT_COMMIT_VALUE", StringType()),
        StructField("RECORD_CREATED_TS", TimestampType()),
    ])
)

# Add current_timestamp to the DataFrame
sf_ts = sf_df.with_column("current_timestamp", current_timestamp())
#sf_ts.show()

# Attempt to add 10 days to each date in the DataFrame using dateadd
sf_df_with_added_days = sf_df.with_column("date_plus_10_days", dateadd("day", lit(10), current_timestamp()))
sf_df_with_added_days.show()

4. What did you expect to see?

  • For the current_timestamp addition:

    • I expected that adding the current timestamp would successfully append a new column current_timestamp to each row, showing the exact current timestamp, with no null or NaT values.
  • For the date_plus_10_days calculation:

    • The expectation was that the dateadd function would add 10 days to the dates in the RECORD_CREATED_TS column without any errors, accurately reflecting the new dates.

5. What happened instead?

  • For the current_timestamp addition:
    • The new column current_timestamp unexpectedly included NaT values for rows where ROLE_NAME was 'RL_TEST_ROLE_SUPPORT'.

The following table illustrates the issue with the CURRENT_TIMESTAMP column, where one row unexpectedly contains a NaT value:

| ROLE_NAME            | GIT_COMMIT_VALUE | RECORD_CREATED_TS   | CURRENT_TIMESTAMP          |
|----------------------|------------------|---------------------|----------------------------|
| RL_TEST_ROLE_GRANTS  | ABC123           | 2000-01-01 00:00:00 | 2024-07-01 02:18:38.944660 |
| RL_TEST_ROLE_SUPPORT | ABC123           | 2000-01-01 00:00:00 | NaT                        |
  • For the date_plus_10_days calculation:
    • The attempt to add days resulted in a type error: numpy.core._exceptions._UFuncBinaryResolutionError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('O'), indicating a problem with how data types are handled during date arithmetic in Snowpark.
Traceback (most recent call last):
  File "test.py", line 27, in <module>
    sf_df_with_added_days.show()
  File "telemetry.py", line 139, in wrap
    result = func(*args, **kwargs)
  File "dataframe.py", line 3050, in show
    self._show_string(
  File "dataframe.py", line 3168, in _show_string
    result, meta = self._session._conn.get_result_and_metadata(
  File "_connection.py", line 649, in get_result_and_metadata
    res = execute_mock_plan(plan)
  File "_plan.py", line 462, in execute_mock_plan
    column_series = calculate_expression(
  File "_plan.py", line 1318, in calculate_expression
    return calculate_expression(exp.child, input_data, analyzer, expr_to_alias)
  File "_plan.py", line 1320, in calculate_expression
    return handle_function_expression(exp, input_data, analyzer, expr_to_alias)
  File "_plan.py", line 388, in handle_function_expression
    return _MOCK_FUNCTION_IMPLEMENTATION_MAP[func_name](*to_pass_args)
  File "_functions.py", line 1146, in mock_dateadd
    res = datetime_expr.combine(
  File "series.py", line 3396, in combine
    new_values[i] = func(lv, rv)
  File "_functions.py", line 1147, in <lambda>
    value_expr, lambda date, duration: func(cast(date), duration)
  File "_functions.py", line 1107, in add_timedelta
    return date + datetime.timedelta(**{f"{unit}s": duration * scalar})
numpy.core._exceptions._UFuncBinaryResolutionError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('O')

Additional Note:

It's important to mention that the operations described above work as expected when executed in a standard Snowflake session. The issues outlined occur exclusively during local testing with the Snowpark library. This suggests that the mock environment might not fully replicate the behavior of the actual Snowflake environment or there might be issues specific to the local testing implementation.

@hima-gopisetty hima-gopisetty added bug Something isn't working local testing Local Testing issues/PRs needs triage Initial RCA is required labels Jul 1, 2024
@github-actions github-actions bot changed the title Local Testing Issues with dateadd and current_timestamp Functions in Snowpark Python SNOW-1513884: Local Testing Issues with dateadd and current_timestamp Functions in Snowpark Python Jul 1, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Jul 1, 2024
@sfc-gh-sghosh
Copy link

Hello @hima-gopisetty ,

Thanks for raising the issue, we are looking into it, will update.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage Issue is under initial triage and removed needs triage Initial RCA is required labels Jul 1, 2024
@sfc-gh-sghosh
Copy link

Hello @hima-gopisetty ,

I tried the code with local testing and its working as expected. There are no Null or Nat values getting inserted, and you need to change the code to add 10 days to column 'RECORD_CREATED_TS'

For the current_timestamp addition:

Q1. I expected that adding the current timestamp would successfully append a new column current_timestamp to each row, showing the exact current timestamp, with no null or NaT values

Ans: Its not adding null or NaT values, proper current_timestamp is getting added to the dataframe.

Q2 For the date_plus_10_days calculation:

The expectation was that the dateadd function would add 10 days to the dates in the RECORD_CREATED_TS column without any errors, accurately reflecting the new dates.

Ans: The code needs to be modified, please use the following code

sf_df_with_added_days = sf_df.with_column("date_plus_10_days", dateadd("day", lit(10), col("RECORD_CREATED_TS")))
sf_df_with_added_days.show()

Please find the updated code and output

`from snowflake.snowpark.functions import dateadd, lit, current_timestamp
from snowflake.snowpark.types import StringType, StructField, StructType, TimestampType
from snowflake.snowpark import Session

Create a session - ensure you set up your own configuration appropriately

session = Session.builder.config('local_testing', True).create()

Create a Snowflake DataFrame

sf_df = session.create_dataframe(
data = [
["RL_TEST_ROLE_GRANTS", "ABC123", "2000-01-01 00:00:00"],
["RL_TEST_ROLE_SUPPORT", "ABC123", "2000-01-01 00:00:00"]
],
schema=StructType([
StructField("ROLE_NAME", StringType()),
StructField("GIT_COMMIT_VALUE", StringType()),
StructField("RECORD_CREATED_TS", TimestampType()),
])
)
sf_df.show()

Add current_timestamp to the DataFrame

sf_ts = sf_df.with_column("current_timestamp", current_timestamp())
sf_ts.show()

Attempt to add 10 days to each date in the DataFrame using dateadd

sf_df_with_added_days = sf_df.with_column("date_plus_10_days", dateadd("day", lit(10), current_timestamp()))
sf_df_with_added_days.show()

Add 10 days to the RECORD_CREATED_TS column using dateadd

sf_df_with_added_days = sf_df.with_column("date_plus_10_days", dateadd("day", lit(10), col("RECORD_CREATED_TS")))
sf_df_with_added_days.show()


|"ROLE_NAME" |"GIT_COMMIT_VALUE" |"RECORD_CREATED_TS" |

|RL_TEST_ROLE_GRANTS |ABC123 |2000-01-01 00:00:00 |
|RL_TEST_ROLE_SUPPORT |ABC123 |2000-01-01 00:00:00 |


|"ROLE_NAME" |"GIT_COMMIT_VALUE" |"RECORD_CREATED_TS" |"CURRENT_TIMESTAMP" |

|RL_TEST_ROLE_GRANTS |ABC123 |2000-01-01 00:00:00 |2024-07-04 23:10:52.426186 |
|RL_TEST_ROLE_SUPPORT |ABC123 |2000-01-01 00:00:00 |2024-07-04 23:10:52.426186 |


|"ROLE_NAME" |"GIT_COMMIT_VALUE" |"RECORD_CREATED_TS" |"DATE_PLUS_10_DAYS" |

|RL_TEST_ROLE_GRANTS |ABC123 |2000-01-01 00:00:00 |2024-07-14 23:10:52.436700 |
|RL_TEST_ROLE_SUPPORT |ABC123 |2000-01-01 00:00:00 |2024-07-14 23:10:52.436700 |


|"ROLE_NAME" |"GIT_COMMIT_VALUE" |"RECORD_CREATED_TS" |"DATE_PLUS_10_DAYS" |

|RL_TEST_ROLE_GRANTS |ABC123 |2000-01-01 00:00:00 |2000-01-11 00:00:00 |
|RL_TEST_ROLE_SUPPORT |ABC123 |2000-01-01 00:00:00 |2000-01-11 00:00:00 |

`

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed bug Something isn't working status-triage Issue is under initial triage labels Jul 4, 2024
@hima-gopisetty
Copy link
Author

Hi @sfc-gh-sghosh ,

Could you let me know if you are using the same snowflake-snowpark-python==1.14.0 version as mine? When I run it again locally, I encounter the same issue where it's adding NaT to rows. In fact, it's adding NaT to all rows except the first one

-------------------------------------------------------------------------------------------------
|"ROLE_NAME"            |"GIT_COMMIT_VALUE"  |"RECORD_CREATED_TS"  |"CURRENT_TIMESTAMP"         |
-------------------------------------------------------------------------------------------------
|RL_TEST_ROLE_GRANTS    |ABC123              |2000-01-01 00:00:00  |2024-07-04 21:53:13.171780  |
|RL_TEST_ROLE_SUPPORT   |ABC123              |2000-01-01 00:00:00  |NaT                         |
|RL_TEST_ROLE_SUPPORT2  |ABC123              |2000-01-01 00:00:00  |NaT                         |
|RL_TEST_ROLE_SUPPORT3  |ABC123              |2000-01-01 00:00:00  |NaT                         |
-------------------------------------------------------------------------------------------------

@sfc-gh-sghosh
Copy link

Hello @hima-gopisetty ,

Yes, you are right. The issue is with snowpark python 1.14.0. I am also getting NaT.
But with 1.16.0 or with the latest 1.19.0, there is no such error, which means the issue has been fixed already, so please use the latest Snowpark python connector, 1.19.0.

Regards,
Sujan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
local testing Local Testing issues/PRs status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants