Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1447522: Session.create_dataframe fails in local testing. #1680

Closed
Zedarflight opened this issue May 24, 2024 · 3 comments
Closed

SNOW-1447522: Session.create_dataframe fails in local testing. #1680

Zedarflight opened this issue May 24, 2024 · 3 comments
Assignees
Labels
local testing Local Testing issues/PRs question Further information is requested status-triage_done Initial triage done, will be further handled by the driver team triaged

Comments

@Zedarflight
Copy link

  1. What version of Python are you using?

    Python 3.11.9

  2. What operating system and processor architecture are you using?

    Linux-4.18.0-372.64.1.el8_6.x86_64-x86_64-with-glibc2.31

  3. What are the component versions in the environment (pip freeze)?

    Snowpark version 1.17.0

  4. What did you do?
    I'm attempting to set up unit tests with pytest for a project I'm working on (following these docs), and have found that Session.create_dataframe doesn't work, due to how TableEmulator.init is behaving.

Easily reproducible example:

from snowflake.snowpark import Session
session = Session.builder.config('local_testing', True).create()

df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"]) # Create a Snowpark dataframe

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 4
      1 from snowflake.snowpark import Session
      2 session = Session.builder.config('local_testing', True).create()
----> 4 df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"]) # Create a Snowpark dataframe
...
snip
...
File /redacted/lib/python3.11/site-packages/snowflake/snowpark/mock/_snowflake_data_type.py:255, in TableEmulator.__init__(self, sf_types, sf_types_by_col_index, *args, **kwargs)
    248 def __init__(
    249     self,
    250     *args,
   (...)
    253     **kwargs,
    254 ) -> None:
--> 255     super().__init__(*args, **kwargs)
    256     self.sf_types = {} if not sf_types else sf_types
    257     # TODO: SNOW-976145, move to index based approach to store col type mapping

TypeError: object.__init__() takes exactly one argument (the instance to initialize)

Other reference used - create_dataframe usage + syntax was pulled from the create_dataframe page.

Some context of the actual use case:

import snowflake
mocked_session = snowflake.snowpark.Session.builder.config('local_testing', True).create()
# Mock up specific sql queries
def mock_sql(session, query):  # patch for SQL operations
    if query == "SHOW GRANTS TO USER testuser":
        return session.create_dataframe([snowflake.snowpark.Row(role='ROLENAMEHERE', granted_to="USER", grantee_name="testuser", granted_by="conftest_setup")])
    else:
        raise RuntimeError(f"Unexpected query execution: {query}")
mocker.patch.object(mocked_session, 'sql', wraps=partial(mock_sql, mocked_session)) # apply patch for SQL operations
  1. What did you expect to see?

    I expected to be able to use Session.create_dataframes when in local testing, since Session.create_dataframe is on the list of supported APIs.

  2. Can you set logging to DEBUG and collect the logs?

2024-05-24 23:12:54,567 - MainThread connection.py:399 - __init__() - INFO - Snowflake Connector for Python Version: 3.10.1, Python Version: 3.11.9, Platform: Linux-4.18.0-372.64.1.el8_6.x86_64-x86_64-with-glibc2.31
2024-05-24 23:12:54,571 - MainThread session.py:506 - __init__() - INFO - Snowpark Session information: 
"version" : 1.17.0,
"python.version" : 3.11.9,
"python.connector.version" : 3.10.1,
"python.connector.session.id" : 1,
"os.name" : Linux

stacktrace of error, see 4.
@Zedarflight Zedarflight added bug Something isn't working needs triage Initial RCA is required labels May 24, 2024
@github-actions github-actions bot changed the title Session.create_dataframe fails in local testing. SNOW-1447522: Session.create_dataframe fails in local testing. May 24, 2024
@sfc-gh-aling
Copy link
Contributor

hey @Zedarflight , local testing requires pandas as dependency, have you installed pandas in your env?

@sfc-gh-aling sfc-gh-aling added question Further information is requested triaged local testing Local Testing issues/PRs and removed bug Something isn't working needs triage Initial RCA is required labels May 28, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this May 29, 2024
@sfc-gh-sghosh
Copy link

Hello @Zedarflight ,

Thanks for raising the issue.

As my colleague said, it requires pandas package to be installed locally, I tested the code and its working fine.

`from snowflake.snowpark.session import Session
from snowflake.snowpark import functions as F
from snowflake.snowpark.types import *

import pandas as pd
import numpy as np
session = Session.builder.config('local_testing', True).create()

df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"]) # Create a Snowpark dataframe
df.show()

Ouput:

|"A" |"B" |

|1 |2 |
|3 |4 |
-------------`

This is not an issue from Snowflake, its an configuration issue.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added the status-triage_done Initial triage done, will be further handled by the driver team label May 29, 2024
@Zedarflight
Copy link
Author

Pandas was present in the environment. Upgrading to 1.18.0 has fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
local testing Local Testing issues/PRs question Further information is requested status-triage_done Initial triage done, will be further handled by the driver team triaged
Projects
None yet
Development

No branches or pull requests

3 participants