Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1455291: Unit Tests broken after upgrading to Snowpark 1.17 or 1.18 #1705

Closed
jeromesubs opened this issue May 30, 2024 · 5 comments
Closed
Assignees
Labels
bug Something isn't working local testing Local Testing issues/PRs needs triage Initial RCA is required

Comments

@jeromesubs
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

~ python3 --version --version
Python 3.11.8 (main, Feb 26 2024, 21:39:34) [GCC 11.2.0]

  1. What are the Snowpark Python and pandas versions in the environment?

~ python3 -m pip freeze | grep -e "snowpark" -e "pandas"
pandas==2.2.1
snowflake-snowpark-python==1.18.0

  1. What did you do?

We upraded our solution from Snowpark 1.16 to 1.17 (and the 1.18)
And ran our PyTest tests in PyCharm

  1. What did you expect to see?

I expected all my unit tests to keep working, but 3 of them are now broken with the same error:

-------------------------------- live log setup --------------------------------
2024-05-29 20:57:20 >> snowflake.connector.connection >> INFO >> 699795 >> MainProcess >> 140457944372288 >> /home/xxx/miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/connector/connection.py >> Snowflake Connector for Python Version: 3.10.0, Python Version: 3.11.8, Platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
2024-05-29 20:57:20 >> snowflake.snowpark.session >> INFO >> 699795 >> MainProcess >> 140457944372288 >> /home/xxx/miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/session.py >> Snowpark Session information:
"version" : 1.18.0,
"python.version" : 3.11.8,
"python.connector.version" : 3.10.0,
"python.connector.session.id" : 1,
"os.name" : Linux

FAILED [ 8%]
/brs/br001_test.py:37 (TestBr001ValidateInputData.test_br001_validate_input_data_empty)
self = <brs.br001_test.TestBr001ValidateInputData testMethod=test_br001_validate_input_data_empty>

def test_br001_validate_input_data_empty(self):
    repo1 = BRepository(self.__session)

    repo2 = ICCRRepository(self.__session)

    repo1.get_instrument = MagicMock(
        return_value = InOut.create_instrument_dataframe_empty(self.__session)
    )

    repo2.get_iccr = MagicMock(
        return_value = InOut.create_iccr_dataframe_empty(self.__session)
    )

    # Create an instance of Br001ValidateInputData
    validator = Br001ValidateInputData(repo1, repo2)

    # Call validate_input_data with a specific date (e.g., 2030-05-22)
    with self.assertRaises(ValueError) as cm:
      validator.validate_input_data(date(2030, 5, 22))

/brs/br001_test.py:56:


../apps/snowpark/instrument/business_rules/br001_validate_input_data.py:18: in validate_input_data
if df_instrument.count() == 0:
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/dataframe.py:2977: in count
result = df._internal_collect_with_tag(
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/_internal/telemetry.py:145: in wrap
result = func(*args, **kwargs)
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/dataframe.py:645: in _internal_collect_with_tag_no_telemetry
return self._session._conn.execute(
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/mock/_connection.py:619: in execute
res = execute_mock_plan(plan, plan.expr_to_alias)
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/mock/plan.py:670: in execute_mock_plan
from_df = execute_mock_plan(from
, expr_to_alias)
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/mock/_plan.py:661: in execute_mock_plan
return execute_mock_plan(source_plan.execution_plan, expr_to_alias)
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/mock/_plan.py:829: in execute_mock_plan
child_rf = execute_mock_plan(source_plan.child, expr_to_alias)
../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/snowflake/snowpark/mock/_plan.py:715: in execute_mock_plan
null_rows_idxs_map[column_name] = column_series._null_rows_idxs


@final
def __getattr__(self, name: str):
    """
    After regular attribute access, try looking up the name
    This allows simpler access to columns for interactive use.
    """
    # Note: obj.x will always call obj.__getattribute__('x') prior to
    # calling obj.__getattr__('x').
    if (
        name not in self._internal_names_set
        and name not in self._metadata
        and name not in self._accessors
        and self._info_axis._can_hold_identifiers_and_holds_name(name)
    ):
        return self[name]
  return object.__getattribute__(self, name)

E AttributeError: 'TableEmulator' object has no attribute '_null_rows_idxs'

../../../../miniconda3/envs/venv311/lib/python3.11/site-packages/pandas/core/generic.py:6296: AttributeError

@jeromesubs jeromesubs added bug Something isn't working local testing Local Testing issues/PRs needs triage Initial RCA is required labels May 30, 2024
@github-actions github-actions bot changed the title Unit Tests broken after upgrading to Snowpark 1.17 or 1.18 SNOW-1455291: Unit Tests broken after upgrading to Snowpark 1.17 or 1.18 May 30, 2024
@sfc-gh-aling sfc-gh-aling self-assigned this May 30, 2024
@sfc-gh-aling
Copy link
Contributor

thanks @jeromesubs for reporting, I'm sorry that your unit test is broken, we will look into this issue asap.

@sfc-gh-aling
Copy link
Contributor

hi @jeromesubs , can you share with me some more details on how validate_input_data is implemented using the snowpark python apis, probably also Br001ValidateInputData. it would help me reproduce the issue

@jeromesubs
Copy link
Author

jeromesubs commented May 31, 2024

Hi @sfc-gh-aling

Here is the class and method:

class Br001ValidateInputData:

    def __init__(self,
                brepo: Repo1,
                iccrepo: Repo2):
        self.__brepo = brepo
        self.__iccrepo = iccrepo

    def validate_input_data(self, mydate):

        df_instrument = self.__brepo.get_instrument(mydate)

        df_iccr = self.__iccrepo.get_iccr(mydate)

        error_text = ""
        if df_instrument.count() == 0:
            error_text += f"No Data received on {mydate.strftime('%Y-%m-%d')}\n"

        if df_iccr.count() == 0:
            error_text += f"No Data received on {mydate.strftime('%Y-%m-%d')}\n"

        knowledge_base = "Resubmit and follow the steps"
        user_guide_and_error_message = error_text+"Please resubmit once data is available."+ knowledge_base

        if error_text:
            raise ValueError(user_guide_and_error_message)

Both get_instrument and get_iccr are simple table select like below, but they are mocked in the UT:

class Repo1:

    def get_instrument (self, mydate: datetime) -> snowpark.DataFrame:

        schema = "S1"
        table_name = "T1"
        df_instrument= (
            self.__session.table(f"{schema}.{table_name}")
            .filter(F.col("CURRENT") == "TRUE")
            .filter(F.col("DATE") == mydate)
            .select(
                dt.try_cast(F.col("COL1"), T.StringType()),
                dt.try_cast(F.col("COL2"), T.IntegerType()),
                dt.try_cast(F.col("COL3"), T.StringType()),
                dt.try_cast(F.col("COL4"), T.StringType()),
                dt.try_cast(F.col("COL5"), T.StringType()),
                dt.try_cast(F.col("COL6"), T.StringType()),
                dt.try_cast(F.col("COL7"), T.StringType()),
                dt.try_cast(F.col("COL8"), T.DateType()),
                dt.try_cast(F.col("COL9"), T.DateType()),
                dt.try_cast(F.col("COL10"), T.DateType()),
                dt.try_cast(F.col("COL11"), T.DateType()),
                dt.try_cast(F.col("COL12"), T.DateType()),
                dt.try_cast(F.col("COL13"), T.DateType()),
                dt.try_cast(F.col("COL14"), T.DateType())
            )
        )
        return df_instrument

The mock methods for creating empty dataframe look like this:

def create_iccr_dataframe_empty(snowpark_session: Session) -> DataFrame:

    input_data = []

    schema = T.StructType([
        T.StructField("COL1", T.IntegerType()),
        T.StructField("COL1", T.StringType()),
        T.StructField("COL1", T.StringType()),
        T.StructField("COL1", T.DateType()),
        T.StructField("COL1", T.StringType()),
        T.StructField("COL1", T.StringType()),
        T.StructField("COL1", T.StringType()),
        T.StructField("COL1", T.DateType())
    ])

    iccr_input_df = snowpark_session.create_dataframe(input_data, schema)
    return iccr_input_df

@sfc-gh-aling
Copy link
Contributor

thanks for the code, it's very helpful to debugging.
I have identified that it's a bug in handling empty data when the input data column is of type DateType.
I opened a PR for the fix: #1716

@sfc-gh-aling
Copy link
Contributor

hi @jeromesubs, we have merged the PR, it will be carried in our next release which is expected to happen next week.
to try out the fix now, you can install from the git main branch: pip install git+https://github.com/snowflakedb/snowpark-python.git

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working local testing Local Testing issues/PRs needs triage Initial RCA is required
Projects
None yet
Development

No branches or pull requests

2 participants