Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-912320: The behavior of save_as_table(mode='overwrite') is inconsistent between v1.5.1 and v1.6.1 #1046

Closed
everpeace opened this issue Sep 11, 2023 · 1 comment · Fixed by #1075
Assignees
Labels
bug Something isn't working triaged

Comments

@everpeace
Copy link

everpeace commented Sep 11, 2023

  1. What version of Python are you using?

    3.8.17 (default, Jul 5 2023, 20:35:26) [GCC 11.2.0](python worksheet in snowflake console)

  2. What operating system and processor architecture are you using?

    Linux-5.4.181-99.354.amzn2.aarch64-aarch64-with-glibc2.17 (python worksheet in snowflake console)

  3. What are the component versions in the environment (pip freeze)?

    tested in python worksheet in snowflake console only with snowflake-snowpark-python dependency(anaconda)

  4. What did you do?

    Run below python worksheet with snowflake-snowpark-python==1.5.1 and snowflake-snowpark-python==1.6.1 and see the result is inconsistent.

# The Snowpark package is required for Python Worksheets. 
# You can add more packages by selecting them using the Packages control and then importing them.

import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col, lit
from snowflake.snowpark.types import LongType


def main(session: snowpark.Session) -> snowpark.DataFrame: 
    TABLE_NAME = "_TEST"
    # clean the table first
    session.sql(f"DROP TABLE IF EXISTS {TABLE_NAME}").collect()

    # create table with dataframe
    df = session.create_dataframe([1,2]).to_df(["a"])
    df.write.save_as_table(TABLE_NAME)

    # adding column 'b' as LongType() with NULL values
    # note: lit(None) infers its datatype with StringType().
    #       So, adding dummy_column with None value first
    #       and then cast it to LongType()
    tbl = session.table(TABLE_NAME).select('*')
    tbl = tbl.with_column('b_none', lit(None))
    tbl = tbl.with_column('b', col('b_none').astype(LongType()))
    tbl = tbl.drop('b_none')

    # overwrite it
    tbl.write.save_as_table(TABLE_NAME, column_order='name', mode='overwrite')

    # The output of snowflake-snowpark-python v1.5.1:
    # --------------
    # |"A"  |"B"   |
    # --------------
    # |1    |NULL  |
    # |2    |NULL  |
    # --------------
    #
    # The output of snowflake-snowpark-python v1.6.1:
    # It expects to be the same with v1.5.1
    # -------------
    # |"A"  |"B"  |
    # -------------
    # |     |     |
    # -------------
    return session.table(TABLE_NAME).select('*')
  1. What did you expect to see?

    consistent results between 1.5.1 and 1.6.1

  2. Can you set logging to DEBUG and collect the logs?

    I could not extract log in python worksheet in snowflake console. sorry.

@everpeace everpeace added bug Something isn't working needs triage Initial RCA is required labels Sep 11, 2023
@github-actions github-actions bot changed the title The behavior of save_as_table(mode='overwrite') is inconsistent between v1.5.1 and v1.6.1 SNOW-912320: The behavior of save_as_table(mode='overwrite') is inconsistent between v1.5.1 and v1.6.1 Sep 11, 2023
@sfc-gh-aalam sfc-gh-aalam self-assigned this Sep 14, 2023
@sfc-gh-aalam
Copy link
Contributor

There was a behavior change with version 1.6.1 where if the schema of the dataframe (either inferred or explicitly defined) is non-nullable, then we don't input NULL columns into table using save_as_table.
https://github.com/snowflakedb/snowpark-python/releases/tag/v1.6.1. As a part of this behavior change, save_as_table now creates table using a 2-step process, 1. create or replace table, and 2. insert into table.

When you are overwriting into the same database name, create or replace table overwrites the source data from column A which gives you an empty table as a result. Can you trying overwriting it into a different table name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants