Merge branch 'main' into jrose_snow_1651234_structured_create_dataframe

snowflakedb · Sep 17, 2024 · a803ca7 · a803ca7
2 parents bfea23c + 8414933
commit a803ca7
Show file tree

Hide file tree

Showing 117 changed files with 7,866 additions and 5,786 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,12 +1,36 @@
 # Release History
 
-## 1.22.0 (TBD)
+## 1.23.0 (TBD)
+
+### Snowpark pandas API Updates
+
+#### Improvements
+
+- Improved `to_pandas` to persist the original timezone offset for TIMESTAMP_TZ type.
+
+#### New Features
+
+- Added support for `TimedeltaIndex.mean` method.
+- Added support for some cases of aggregating `Timedelta` columns on `axis=0` with `agg` or `aggregate`.
+- Added support for `by`, `left_by`, and `right_by` for `pd.merge_asof`.
+
+#### Bug Fixes
+
+- Fixed a bug where an `Index` object created from a `Series`/`DataFrame` incorrectly updates the `Series`/`DataFrame`'s index name after an inplace update has been applied to the original `Series`/`DataFrame`.
+- Suppressed an unhelpful `SettingWithCopyWarning` that sometimes appeared when printing `Timedelta` columns.
+
+
+## 1.22.1 (2024-09-11)
+This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.
+
+
+## 1.22.0 (2024-09-10)
 
 ### Snowpark Python API Updates
 
 ### New Features
 
-- Added following new functions in `snowflake.snowpark.functions`:
+- Added the following new functions in `snowflake.snowpark.functions`:
   - `array_remove`
   - `ln`
 
@@ -46,14 +70,14 @@
 - Fixed a bug in `session.read.csv` that caused an error when setting `PARSE_HEADER = True` in an externally defined file format.
 - Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
 - Fixed a bug in `session.get_session_stage` that referenced a non-existing stage after switching database or schema.
-- Fixed a bug where calling `DataFrame.to_snowpark_pandas_dataframe` without explicitly initializing the Snowpark pandas plugin caused an error.
+- Fixed a bug where calling `DataFrame.to_snowpark_pandas` without explicitly initializing the Snowpark pandas plugin caused an error.
 - Fixed a bug where using the `explode` function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the `outer` parameter.
 
 ### Snowpark Local Testing Updates
 
 #### New Features
 
-- Added support for type coercion when passing columns as input to udf calls
+- Added support for type coercion when passing columns as input to UDF calls.
 - Added support for `Index.identical`.
 
 #### Bug Fixes
@@ -105,6 +129,9 @@
 - Added support for creating a `DatetimeIndex` from an `Index` of numeric or string type.
 - Added support for string indexing with `Timedelta` objects.
 - Added support for `Series.dt.total_seconds` method.
+- Added support for `DataFrame.apply(axis=0)`.
+- Added support for `Series.dt.tz_convert` and `Series.dt.tz_localize`.
+- Added support for `DatetimeIndex.tz_convert` and `DatetimeIndex.tz_localize`.
 
 #### Improvements
 
@@ -113,9 +140,11 @@
 - Improved `pd.to_datetime` to handle all local input cases. 
 - Create a lazy index from another lazy index without pulling data to client.
 - Raised `NotImplementedError` for Index bitwise operators.
-- Display a clearer error message when `Index.names` is set to a non-like-like object.
+- Display a more clear error message when `Index.names` is set to a non-like-like object.
 - Raise a warning whenever MultiIndex values are pulled in locally.
 - Improve warning message for `pd.read_snowflake` include the creation reason when temp table creation is triggered.
+- Improve performance for `DataFrame.set_index`, or setting `DataFrame.index` or `Series.index` by avoiding checks require eager evaluation. As a consequence, when the new index that does not match the current `Series`/`DataFrame` object length, a `ValueError` is no longer raised. Instead, when the `Series`/`DataFrame` object is longer than the provided index, the `Series`/`DataFrame`'s new index is filled with `NaN` values for the "extra" elements. Otherwise, the extra values in the provided index are ignored.
+- Properly raise `NotImplementedError` when ambiguous/nonexistent are non-string in `ceil`/`floor`/`round`.
 
 #### Bug Fixes
 
@@ -126,10 +155,6 @@
 - Fixed a bug where `Series.reindex` and `DataFrame.reindex` did not update the result index's name correctly.
 - Fixed a bug where `Series.take` did not error when `axis=1` was specified.
 
-#### Behavior Change
-
-- When calling `DataFrame.set_index`, or setting `DataFrame.index` or `Series.index`, with a new index that does not match the current length of the `Series`/`DataFrame` object, a `ValueError` is no longer raised. When the `Series`/`DataFrame` object is longer than the new index, the `Series`/`DataFrame`'s new index is filled with `NaN` values for the "extra" elements. When the `Series`/`DataFrame` object is shorter than the new index, the extra values in the new index are ignored—`Series` and `DataFrame` stay the same length `n`, and use only the first `n` values of the new index.
-
 
 ## 1.21.1 (2024-09-05)
 

diff --git a/docs/source/modin/series.rst b/docs/source/modin/series.rst
@@ -279,6 +279,8 @@ Series
     Series.dt.seconds
     Series.dt.microseconds
     Series.dt.nanoseconds
+    Series.dt.tz_convert
+    Series.dt.tz_localize
 
 
 .. rubric:: String accessor methods

diff --git a/docs/source/modin/supported/dataframe_supported.rst b/docs/source/modin/supported/dataframe_supported.rst
@@ -84,7 +84,7 @@ Methods
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``any``                     | P                               |                                  | ``N`` for non-integer/boolean types                |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
-| ``apply``                   | P                               |                                  | ``N`` if ``axis == 0`` or ``func`` is not callable |
+| ``apply``                   | P                               |                                  | ``N`` if ``func`` is not callable                  |
 |                             |                                 |                                  | or ``result_type`` is given or ``args`` and        |
 |                             |                                 |                                  | ``kwargs`` contain DataFrame or Series             |
 |                             |                                 |                                  | ``N`` if ``func`` maps to different column labels. |
@@ -471,8 +471,7 @@ Methods
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``to_xml``                  | N                               |                                  |                                                    |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
-| ``transform``               | P                               |                                  |  Only callable and string parameters are supported.|
-|                             |                                 |                                  |  list and dict parameters are not supported.       |
+| ``transform``               | P                               |                                  | ``Y`` if ``func`` is callable.                     |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``transpose``               | P                               |                                  | See ``T``                                          |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+

diff --git a/docs/source/modin/supported/datetime_index_supported.rst b/docs/source/modin/supported/datetime_index_supported.rst
@@ -82,9 +82,9 @@ Methods
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``snap``                    | N                               |                                  |                                                    |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
-| ``tz_convert``              | N                               |                                  |                                                    |
+| ``tz_convert``              | Y                               |                                  |                                                    |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
-| ``tz_localize``             | N                               |                                  |                                                    |
+| ``tz_localize``             | P                               | ``ambiguous``, ``nonexistent``   |                                                    |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``round``                   | P                               | ``ambiguous``, ``nonexistent``   |                                                    |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+

diff --git a/docs/source/modin/supported/general_supported.rst b/docs/source/modin/supported/general_supported.rst
@@ -38,8 +38,7 @@ Data manipulations
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``merge``                   | P                               | ``validate``                     | ``N`` if param ``validate`` is given               |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
-| ``merge_asof``              | P                               | ``by``, ``left_by``, ``right_by``| ``N`` if param ``direction`` is ``nearest``.       |
-|                             |                                 | , ``left_index``, ``right_index``|                                                    |
+| ``merge_asof``              | P                               | ``left_index``, ``right_index``, | ``N`` if param ``direction`` is ``nearest``.       |
 |                             |                                 | , ``suffixes``, ``tolerance``    |                                                    |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``merge_ordered``           | N                               |                                  |                                                    |

diff --git a/docs/source/modin/supported/series_dt_supported.rst b/docs/source/modin/supported/series_dt_supported.rst
@@ -80,9 +80,10 @@ the method in the left column.
 +-----------------------------+---------------------------------+----------------------------------------------------+
 | ``to_pydatetime``           | N                               |                                                    |
 +-----------------------------+---------------------------------+----------------------------------------------------+
-| ``tz_localize``             | N                               |                                                    |
+| ``tz_localize``             | P                               | ``N`` if `ambiguous` or `nonexistent` are set to a |
+|                             |                                 | non-default value.                                 |
 +-----------------------------+---------------------------------+----------------------------------------------------+
-| ``tz_convert``              | N                               |                                                    |
+| ``tz_convert``              | Y                               |                                                    |
 +-----------------------------+---------------------------------+----------------------------------------------------+
 | ``normalize``               | Y                               |                                                    |
 +-----------------------------+---------------------------------+----------------------------------------------------+

diff --git a/docs/source/modin/supported/timedelta_index_supported.rst b/docs/source/modin/supported/timedelta_index_supported.rst
@@ -44,7 +44,7 @@ Methods
 +-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
 | ``ceil``                    | Y                               |                                  |                                           |
 +-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
-| ``mean``                    | N                               |                                  |                                           |
+| ``mean``                    | Y                               |                                  |                                           |
 +-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
 | ``total_seconds``           | Y                               |                                  |                                           |
 +-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
diff --git a/recipe/meta.yaml b/recipe/meta.yaml
@@ -1,5 +1,5 @@
 {% set name = "snowflake-snowpark-python" %}
-{% set version = "1.21.1" %}
+{% set version = "1.22.1" %}
 
 package:
   name: {{ name|lower }}

diff --git a/src/snowflake/snowpark/_internal/analyzer/analyzer.py b/src/snowflake/snowpark/_internal/analyzer/analyzer.py
@@ -956,10 +956,7 @@ def do_resolve_with_resolved_children(
                 schema_query = schema_query_for_values_statement(logical_plan.output)
 
             if logical_plan.data:
-                if (
-                    len(logical_plan.output) * len(logical_plan.data)
-                    < ARRAY_BIND_THRESHOLD
-                ):
+                if not logical_plan.is_large_local_data:
                     return self.plan_builder.query(
                         values_statement(logical_plan.output, logical_plan.data),
                         logical_plan,

diff --git a/src/snowflake/snowpark/_internal/analyzer/binary_expression.py b/src/snowflake/snowpark/_internal/analyzer/binary_expression.py
@@ -2,11 +2,12 @@
 # Copyright (c) 2012-2024 Snowflake Computing Inc. All rights reserved.
 #
 
-from typing import AbstractSet, Optional
+from typing import AbstractSet, List, Optional
 
 from snowflake.snowpark._internal.analyzer.expression import (
     Expression,
     derive_dependent_columns,
+    derive_dependent_columns_with_duplication,
 )
 from snowflake.snowpark._internal.analyzer.query_plan_analysis_utils import (
     PlanNodeCategory,
@@ -29,6 +30,9 @@ def __str__(self):
     def dependent_column_names(self) -> Optional[AbstractSet[str]]:
         return derive_dependent_columns(self.left, self.right)
 
+    def dependent_column_names_with_duplication(self) -> List[str]:
+        return derive_dependent_columns_with_duplication(self.left, self.right)
+
     @property
     def plan_node_category(self) -> PlanNodeCategory:
         return PlanNodeCategory.LOW_IMPACT