Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-940613: Snowpark dataframe join fails with Invalid identifier when using dataframe alias #1087

Open
jayajs opened this issue Oct 13, 2023 · 1 comment
Labels
bug Something isn't working needs triage Initial RCA is required

Comments

@jayajs
Copy link

jayajs commented Oct 13, 2023

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.9.13

  1. What operating system and processor architecture are you using?

Windows-10-10.0.19045-SP0

  1. What are the component versions in the environment (pip freeze)?

snowpark verion = 1.8.0
snowflake.connector = 3.2.0

  1. What did you do?
df1 = new_session.create_dataframe([[1, 2, 3, 4]], schema=["e", "f", "g", "h"])
df1 = df1.withColumn('I' ,F.lit(2))
df1.show()
df2 = new_session.create_dataframe([[1, 2, 3, 4]], schema=["e","k", "l", "m"])
df1.alias("a").join(df2.alias("b") , how = 'left' , on = (col("a", "E") == col("b" , "E")))
  1. What did you expect to see?

    Merged output of df1 and df2 on the columns specified

  2. Can you set logging to DEBUG and collect the logs?

{
	"name": "SnowparkSQLException",
	"message": "(1304): 01af9f74-0302-d8e5-0046-c50126a56137: 000904 (42000): SQL compilation error: error line 1 at position 56
invalid identifier 'I'",
	"stack": "---------------------------------------------------------------------------
SnowparkSQLException                      Traceback (most recent call last)
~\\AppData\\Local\\Temp\\ipykernel_61256\\797523069.py in <module>
      1 df2 = new_session.create_dataframe([[1, 2, 3, 4]], schema=[\"e\",\"k\", \"l\", \"m\"])
----> 2 df1.alias(\"a\").join(df2.alias(\"b\") , how = 'left' , on = (col(\"a\", \"E\") == col(\"b\" , \"E\")))

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\telemetry.py in wrap(*args, **kwargs)
    182     @functools.wraps(func)
    183     def wrap(*args, **kwargs):
--> 184         r = func(*args, **kwargs)
    185         plan = r._select_statement or r._plan
    186         # Some DataFrame APIs call other DataFrame APIs, so we need to remove the extra call

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\dataframe.py in join(self, right, on, how, lsuffix, rsuffix, **kwargs)
   2181                 )
   2182 
-> 2183             return self._join_dataframes(
   2184                 right,
   2185                 using_columns,

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\dataframe.py in _join_dataframes(self, right, using_columns, join_type, lsuffix, rsuffix)
   2405     ) -> \"DataFrame\":
   2406         if isinstance(using_columns, Column):
-> 2407             return self._join_dataframes_internal(
   2408                 right,
   2409                 join_type,

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\dataframe.py in _join_dataframes_internal(self, right, join_type, join_exprs, lsuffix, rsuffix)
   2478             return self._with_plan(
   2479                 SelectStatement(
-> 2480                     from_=SelectSnowflakePlan(
   2481                         join_logical_plan, analyzer=self._session._analyzer
   2482                     ),

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\select_statement.py in __init__(self, snowflake_plan, analyzer)
    361             snowflake_plan
    362             if isinstance(snowflake_plan, SnowflakePlan)
--> 363             else analyzer.resolve(snowflake_plan)
    364         )
    365         self.expr_to_alias.update(self._snowflake_plan.expr_to_alias)

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\analyzer.py in resolve(self, logical_plan)
    695         self.generated_alias_maps = {}
    696 
--> 697         result = self.do_resolve(logical_plan)
    698 
    699         result.add_aliases(self.generated_alias_maps)

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\analyzer.py in do_resolve(self, logical_plan)
    738             self.alias_maps_to_use = use_maps
    739 
--> 740         res = self.do_resolve_with_resolved_children(
    741             logical_plan, resolved_children, df_aliased_col_name_to_real_col_name
    742         )

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\analyzer.py in do_resolve_with_resolved_children(self, logical_plan, resolved_children, df_aliased_col_name_to_real_col_name)
    829 
    830         if isinstance(logical_plan, Join):
--> 831             return self.plan_builder.join(
    832                 resolved_children[logical_plan.left],
    833                 resolved_children[logical_plan.right],

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\snowflake_plan.py in join(self, left, right, join_type, condition, source_plan, use_constant_subquery_alias)
    545         use_constant_subquery_alias: bool,
    546     ):
--> 547         return self.build_binary(
    548             lambda x, y: join_statement(
    549                 x, y, join_type, condition, use_constant_subquery_alias

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\snowflake_plan.py in wrap(*args, **kwargs)
    109             def wrap(*args, **kwargs):
    110                 try:
--> 111                     return func(*args, **kwargs)
    112                 except snowflake.connector.errors.ProgrammingError as e:
    113                     query = None

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\snowflake_plan.py in build_binary(self, sql_generator, left, right, source_plan)
    379         )
    380 
--> 381         left_schema_query = schema_value_statement(select_left.attributes)
    382         right_schema_query = schema_value_statement(select_right.attributes)
    383         schema_query = sql_generator(left_schema_query, right_schema_query)

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\functools.py in __get__(self, instance, owner)
    991                 val = cache.get(self.attrname, _NOT_FOUND)
    992                 if val is _NOT_FOUND:
--> 993                     val = self.func(instance)
    994                     try:
    995                         cache[self.attrname] = val

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\snowflake_plan.py in attributes(self)
    248     @cached_property
    249     def attributes(self) -> List[Attribute]:
--> 250         output = analyze_attributes(self.schema_query, self.session)
    251         self.schema_query = schema_value_statement(output)
    252         return output

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\schema_utils.py in analyze_attributes(sql, session)
     80         return convert_result_meta_to_attribute(session._conn._cursor.description)
     81 
---> 82     return session._get_result_attributes(sql)
     83 
     84 

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\session.py in _get_result_attributes(self, query)
   1722 
   1723     def _get_result_attributes(self, query: str) -> List[Attribute]:
-> 1724         return self._conn.get_result_attributes(query)
   1725 
   1726     def get_session_stage(self) -> str:

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\snowflake_plan.py in wrap(*args, **kwargs)
    174                                 e
    175                             )
--> 176                             raise ne.with_traceback(tb) from None
    177                     else:
    178                         ne = SnowparkClientExceptionMessages.SQL_EXCEPTION_FROM_PROGRAMMING_ERROR(

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\analyzer\\snowflake_plan.py in wrap(*args, **kwargs)
    109             def wrap(*args, **kwargs):
    110                 try:
--> 111                     return func(*args, **kwargs)
    112                 except snowflake.connector.errors.ProgrammingError as e:
    113                     query = None

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\snowpark\\_internal\\server_connection.py in get_result_attributes(self, query)
    204     @SnowflakePlan.Decorator.wrap_exception
    205     def get_result_attributes(self, query: str) -> List[Attribute]:
--> 206         return convert_result_meta_to_attribute(self._cursor.describe(query))
    207 
    208     @_Decorator.log_msg_and_perf_telemetry(\"Uploading file to stage\")

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\connector\\cursor.py in describe(self, *args, **kwargs)
    928         \"\"\"
    929         kwargs[\"_describe_only\"] = kwargs[\"_is_internal\"] = True
--> 930         self.execute(*args, **kwargs)
    931         return self._description
    932 

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\connector\\cursor.py in execute(self, command, params, _bind_stage, timeout, _exec_async, _no_retry, _do_reset, _put_callback, _put_azure_callback, _put_callback_output_stream, _get_callback, _get_azure_callback, _get_callback_output_stream, _show_progress_bar, _statement_params, _is_internal, _describe_only, _no_results, _is_put_get, _raise_put_get_error, _force_put_overwrite, _skip_upload_on_content_match, file_stream, num_statements)
    906             )  # NULL result in a non-nullable column
    907             error_class = IntegrityError if is_integrity_error else ProgrammingError
--> 908             Error.errorhandler_wrapper(self.connection, self, error_class, errvalue)
    909         return self
    910 

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\connector\\errors.py in errorhandler_wrapper(connection, cursor, error_class, error_value)
    288         \"\"\"
    289 
--> 290         handed_over = Error.hand_to_other_handler(
    291             connection,
    292             cursor,

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\connector\\errors.py in hand_to_other_handler(connection, cursor, error_class, error_value)
    343         if cursor is not None:
    344             cursor.messages.append((error_class, error_value))
--> 345             cursor.errorhandler(connection, cursor, error_class, error_value)
    346             return True
    347         elif connection is not None:

c:\\Users\\NNREVENDJA\\.conda\\envs\\dev3\\lib\\site-packages\\snowflake\\connector\\errors.py in default_errorhandler(connection, cursor, error_class, error_value)
    219         errno = error_value.get(\"errno\")
    220         done_format_msg = error_value.get(\"done_format_msg\")
--> 221         raise error_class(
    222             msg=error_value.get(\"msg\"),
    223             errno=None if errno is None else int(errno),

SnowparkSQLException: (1304): 01af9f74-0302-d8e5-0046-c50126a56137: 000904 (42000): SQL compilation error: error line 1 at position 56
invalid identifier 'I'"
}

@jayajs jayajs added bug Something isn't working needs triage Initial RCA is required labels Oct 13, 2023
@github-actions github-actions bot changed the title Snowpark dataframe join fails with Invalid identifier when using dataframe alias SNOW-940613: Snowpark dataframe join fails with Invalid identifier when using dataframe alias Oct 13, 2023
@sfc-gh-jdu
Copy link
Collaborator

Thanks for your feedback, we are looking into it. cc @sfc-gh-stan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Initial RCA is required
Projects
None yet
Development

No branches or pull requests

2 participants