SNOW-1455291: Unit Tests broken after upgrading to Snowpark 1.17 or 1.18 #1705

jeromesubs · 2024-05-30T01:09:51Z

Please answer these questions before submitting your issue. Thanks!

What version of Python are you using?

~ python3 --version --version
Python 3.11.8 (main, Feb 26 2024, 21:39:34) [GCC 11.2.0]

What are the Snowpark Python and pandas versions in the environment?

~ python3 -m pip freeze | grep -e "snowpark" -e "pandas"
pandas==2.2.1
snowflake-snowpark-python==1.18.0

What did you do?

We upraded our solution from Snowpark 1.16 to 1.17 (and the 1.18)
And ran our PyTest tests in PyCharm

What did you expect to see?

I expected all my unit tests to keep working, but 3 of them are now broken with the same error:

-------------------------------- live log setup --------------------------------
2024-05-29 20:57:20 >> snowflake.connector.connection >> INFO >> 699795 >> MainProcess >> 140457944372288 >> /home/xxx/miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/connector/connection.py >> Snowflake Connector for Python Version: 3.10.0, Python Version: 3.11.8, Platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
2024-05-29 20:57:20 >> snowflake.snowpark.session >> INFO >> 699795 >> MainProcess >> 140457944372288 >> /home/xxx/miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/session.py >> Snowpark Session information:
"version" : 1.18.0,
"python.version" : 3.11.8,
"python.connector.version" : 3.10.0,
"python.connector.session.id" : 1,
"os.name" : Linux

FAILED [ 8%]
/brs/br001_test.py:37 (TestBr001ValidateInputData.test_br001_validate_input_data_empty)
self = <brs.br001_test.TestBr001ValidateInputData testMethod=test_br001_validate_input_data_empty>

def test_br001_validate_input_data_empty(self):
    repo1 = BRepository(self.__session)

    repo2 = ICCRRepository(self.__session)

    repo1.get_instrument = MagicMock(
        return_value = InOut.create_instrument_dataframe_empty(self.__session)
    )

    repo2.get_iccr = MagicMock(
        return_value = InOut.create_iccr_dataframe_empty(self.__session)
    )

    # Create an instance of Br001ValidateInputData
    validator = Br001ValidateInputData(repo1, repo2)

    # Call validate_input_data with a specific date (e.g., 2030-05-22)
    with self.assertRaises(ValueError) as cm:

      validator.validate_input_data(date(2030, 5, 22))

/brs/br001_test.py:56:

../apps/snowpark/instrument/business_rules/br001_validate_input_data.py:18: in validate_input_data
if df_instrument.count() == 0:
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/dataframe.py:2977: in count
result = df._internal_collect_with_tag(
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/_internal/telemetry.py:145: in wrap
result = func(*args, **kwargs)
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/dataframe.py:645: in _internal_collect_with_tag_no_telemetry
return self._session._conn.execute(
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/mock/_connection.py:619: in execute
res = execute_mock_plan(plan, plan.expr_to_alias)
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/mock/plan.py:670: in execute_mock_plan
from_df = execute_mock_plan(from, expr_to_alias)
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/mock/_plan.py:661: in execute_mock_plan
return execute_mock_plan(source_plan.execution_plan, expr_to_alias)
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/mock/_plan.py:829: in execute_mock_plan
child_rf = execute_mock_plan(source_plan.child, expr_to_alias)
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/mock/_plan.py:715: in execute_mock_plan
null_rows_idxs_map[column_name] = column_series._null_rows_idxs

@final
def __getattr__(self, name: str):
    """
    After regular attribute access, try looking up the name
    This allows simpler access to columns for interactive use.
    """
    # Note: obj.x will always call obj.__getattribute__('x') prior to
    # calling obj.__getattr__('x').
    if (
        name not in self._internal_names_set
        and name not in self._metadata
        and name not in self._accessors
        and self._info_axis._can_hold_identifiers_and_holds_name(name)
    ):
        return self[name]

  return object.__getattribute__(self, name)

E AttributeError: 'TableEmulator' object has no attribute '_null_rows_idxs'

../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/pandas/core/generic.py:6296: AttributeError

The text was updated successfully, but these errors were encountered:

sfc-gh-aling · 2024-05-30T18:56:42Z

thanks @jeromesubs for reporting, I'm sorry that your unit test is broken, we will look into this issue asap.

sfc-gh-aling · 2024-05-30T22:47:38Z

hi @jeromesubs , can you share with me some more details on how validate_input_data is implemented using the snowpark python apis, probably also Br001ValidateInputData. it would help me reproduce the issue

jeromesubs · 2024-05-31T12:20:34Z

Hi @sfc-gh-aling

Here is the class and method:

class Br001ValidateInputData:

    def __init__(self,
                brepo: Repo1,
                iccrepo: Repo2):
        self.__brepo = brepo
        self.__iccrepo = iccrepo

    def validate_input_data(self, mydate):

        df_instrument = self.__brepo.get_instrument(mydate)

        df_iccr = self.__iccrepo.get_iccr(mydate)

        error_text = ""
        if df_instrument.count() == 0:
            error_text += f"No Data received on {mydate.strftime('%Y-%m-%d')}\n"

        if df_iccr.count() == 0:
            error_text += f"No Data received on {mydate.strftime('%Y-%m-%d')}\n"

        knowledge_base = "Resubmit and follow the steps"
        user_guide_and_error_message = error_text+"Please resubmit once data is available."+ knowledge_base

        if error_text:
            raise ValueError(user_guide_and_error_message)

Both get_instrument and get_iccr are simple table select like below, but they are mocked in the UT:

class Repo1:

    def get_instrument (self, mydate: datetime) -> snowpark.DataFrame:

        schema = "S1"
        table_name = "T1"
        df_instrument= (
            self.__session.table(f"{schema}.{table_name}")
            .filter(F.col("CURRENT") == "TRUE")
            .filter(F.col("DATE") == mydate)
            .select(
                dt.try_cast(F.col("COL1"), T.StringType()),
                dt.try_cast(F.col("COL2"), T.IntegerType()),
                dt.try_cast(F.col("COL3"), T.StringType()),
                dt.try_cast(F.col("COL4"), T.StringType()),
                dt.try_cast(F.col("COL5"), T.StringType()),
                dt.try_cast(F.col("COL6"), T.StringType()),
                dt.try_cast(F.col("COL7"), T.StringType()),
                dt.try_cast(F.col("COL8"), T.DateType()),
                dt.try_cast(F.col("COL9"), T.DateType()),
                dt.try_cast(F.col("COL10"), T.DateType()),
                dt.try_cast(F.col("COL11"), T.DateType()),
                dt.try_cast(F.col("COL12"), T.DateType()),
                dt.try_cast(F.col("COL13"), T.DateType()),
                dt.try_cast(F.col("COL14"), T.DateType())
            )
        )
        return df_instrument

The mock methods for creating empty dataframe look like this:

def create_iccr_dataframe_empty(snowpark_session: Session) -> DataFrame:

    input_data = []

    schema = T.StructType([
        T.StructField("COL1", T.IntegerType()),
        T.StructField("COL1", T.StringType()),
        T.StructField("COL1", T.StringType()),
        T.StructField("COL1", T.DateType()),
        T.StructField("COL1", T.StringType()),
        T.StructField("COL1", T.StringType()),
        T.StructField("COL1", T.StringType()),
        T.StructField("COL1", T.DateType())
    ])

    iccr_input_df = snowpark_session.create_dataframe(input_data, schema)
    return iccr_input_df

sfc-gh-aling · 2024-05-31T17:56:42Z

thanks for the code, it's very helpful to debugging.
I have identified that it's a bug in handling empty data when the input data column is of type DateType.
I opened a PR for the fix: #1716

sfc-gh-aling · 2024-06-03T17:09:30Z

hi @jeromesubs, we have merged the PR, it will be carried in our next release which is expected to happen next week.
to try out the fix now, you can install from the git main branch: pip install git+https://github.com/snowflakedb/snowpark-python.git

jeromesubs added bug Something isn't working local testing Local Testing issues/PRs needs triage Initial RCA is required labels May 30, 2024

github-actions bot changed the title ~~Unit Tests broken after upgrading to Snowpark 1.17 or 1.18~~ SNOW-1455291: Unit Tests broken after upgrading to Snowpark 1.17 or 1.18 May 30, 2024

sfc-gh-aling self-assigned this May 30, 2024

sfc-gh-aling closed this as completed Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1455291: Unit Tests broken after upgrading to Snowpark 1.17 or 1.18 #1705

SNOW-1455291: Unit Tests broken after upgrading to Snowpark 1.17 or 1.18 #1705

jeromesubs commented May 30, 2024

sfc-gh-aling commented May 30, 2024

sfc-gh-aling commented May 30, 2024

jeromesubs commented May 31, 2024 •

edited

Loading

sfc-gh-aling commented May 31, 2024

sfc-gh-aling commented Jun 3, 2024

SNOW-1455291: Unit Tests broken after upgrading to Snowpark 1.17 or 1.18 #1705

SNOW-1455291: Unit Tests broken after upgrading to Snowpark 1.17 or 1.18 #1705

Comments

jeromesubs commented May 30, 2024

sfc-gh-aling commented May 30, 2024

sfc-gh-aling commented May 30, 2024

jeromesubs commented May 31, 2024 • edited Loading

sfc-gh-aling commented May 31, 2024

sfc-gh-aling commented Jun 3, 2024

jeromesubs commented May 31, 2024 •

edited

Loading