Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1105953: count_distinct mock in local testing does not work correctly #1268

Closed
stto-relex opened this issue Feb 23, 2024 · 2 comments · Fixed by #1304
Closed

SNOW-1105953: count_distinct mock in local testing does not work correctly #1268

stto-relex opened this issue Feb 23, 2024 · 2 comments · Fixed by #1304
Labels
bug Something isn't working local testing Local Testing issues/PRs

Comments

@stto-relex
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

3.8.10

  1. What operating system and processor architecture are you using?

    macOS-12.7.1-arm64-arm-64bit

  2. What are the component versions in the environment (pip freeze)?

snowflake-connector-python==3.7.1
snowflake-snowpark-python==1.12.1

  1. What did you do?

start conftest.py
import pytest
from snowflake.snowpark import Session

@pytest.fixture(scope="session")
def sf_session() -> Session:
return Session.builder.config('local_testing', True).create()

end conftest.py

start test_count.py

from snowflake.snowpark import Session
from snowflake.snowpark.dataframe import DataFrame

def get_group_size(df: DataFrame) -> float:
return float(
df.select("a", "b")
.group_by("a")
.agg(pf.count_distinct("b").alias("C"))
.select(pf.avg("C").alias("C"))
.collect()[0].as_dict().pop("C")
)

def test_get_group_size(sf_session: Session):
aa = [
"a1", "a1", "a1", "a1",
"a2", "a2", "a2",
"a3", "a3",
"a4",
]
bb = [
"b1", "b2", "b2", "b3",
"b1", "b2", "b5",
"b1", "b4",
"b1",
]
df = sf_session.create_dataframe(
[[a, b] for a, b in zip(aa, bb)], ["a", "b"]
)
assert get_group_size(df) == 2.25

assert is false as get_group_size(df) returns 1.5.

The mock for count_distinct calculates the first group size as 3,
and the rest as 1 which is incorrect.

  1. What did you expect to see?

get_group_size(df) should return 2.25 as asserted.

  1. Can you set logging to DEBUG and collect the logs?

pytest -v -c /dev/null --log-cli-level=DEBUG tests/test_transformations.py::test_get_group_size
================================================================================================================= test session starts =================================================================================================================
platform darwin -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 --
cachedir: .pytest_cache
rootdir: /tests, configfile: ../../../../../dev/null
plugins: datafiles-3.0.0, cov-3.0.0
collected 1 item

tests/test_transformations.py::test_get_group_size
------------------------------------------------------------------------------------------------------------------- live log setup --------------------------------------------------------------------------------------------------------------------
INFO snowflake.snowpark.session:session.py:442 Snowpark Session information:
"version" : 1.12.1,
"python.version" : 3.8.10,
"python.connector.version" : 3.7.1,
"python.connector.session.id" : 1,
"os.name" : Darwin

FAILED [100%]

====================================================================================================================== FAILURES =======================================================================================================================
_________________________________________________________________________________________________________________ test_get_group_size _________________________________________________________________________________________________________________

sf_session = <snowflake.snowpark.session.Session object at 0x137ee78b0>

def test_get_group_size(sf_session: Session):

    aa = [
        "a1", "a1", "a1", "a1",
        "a2", "a2", "a2",
        "a3", "a3",
        "a4",
    ]

    bb = [
        "b1", "b2", "b2", "b3",
        "b1", "b2", "b5",
        "b1", "b4",
        "b1",
    ]
    df = sf_session.create_dataframe(
        [[a, b] for a, b in zip(aa, bb)], ["a", "b"]
    )
  assert get_group_size(df) == 2.25

E assert 1.5 == 2.25
E +1.5
E -2.25

tests/test_transformations.py:76: AssertionError
---------------------------------------------------------------------------------------------------------------- Captured stderr setup ----------------------------------------------------------------------------------------------------------------
2024-02-23 15:44:30,005 - MainThread session.py:442 - init() - INFO - Snowpark Session information:
"version" : 1.12.1,
"python.version" : 3.8.10,
"python.connector.version" : 3.7.1,
"python.connector.session.id" : 1,
"os.name" : Darwin

----------------------------------------------------------------------------------------------------------------- Captured log setup ------------------------------------------------------------------------------------------------------------------
INFO snowflake.snowpark.session:session.py:442 Snowpark Session information:
"version" : 1.12.1,
"python.version" : 3.8.10,
"python.connector.version" : 3.7.1,
"python.connector.session.id" : 1,
"os.name" : Darwin
=============================================================================================================== short test summary info ===============================================================================================================
FAILED tests/test_transformations.py::test_get_group_size - assert 1.5 == 2.25

@stto-relex stto-relex added bug Something isn't working needs triage Initial RCA is required labels Feb 23, 2024
@github-actions github-actions bot changed the title count_distinct mock in local testing does not work correctly SNOW-1105953: count_distinct mock in local testing does not work correctly Feb 23, 2024
@sfc-gh-aling sfc-gh-aling added the local testing Local Testing issues/PRs label Feb 23, 2024
@sfc-gh-aling
Copy link
Contributor

thanks @stto-relex for reaching out and sharing us with the code, we will take a look at the issue

@sfc-gh-stan
Copy link
Collaborator

This should be fixed by the linked PR. Please reopen if you find any behavior inconsistency, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working local testing Local Testing issues/PRs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants