Skip to content

Commit

Permalink
Merge branch 'main' into jrose_snow_1651234_structured_create_dataframe
Browse files Browse the repository at this point in the history
  • Loading branch information
sfc-gh-jrose authored Sep 17, 2024
2 parents bfea23c + 8414933 commit a803ca7
Show file tree
Hide file tree
Showing 117 changed files with 7,866 additions and 5,786 deletions.
43 changes: 34 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,36 @@
# Release History

## 1.22.0 (TBD)
## 1.23.0 (TBD)

### Snowpark pandas API Updates

#### Improvements

- Improved `to_pandas` to persist the original timezone offset for TIMESTAMP_TZ type.

#### New Features

- Added support for `TimedeltaIndex.mean` method.
- Added support for some cases of aggregating `Timedelta` columns on `axis=0` with `agg` or `aggregate`.
- Added support for `by`, `left_by`, and `right_by` for `pd.merge_asof`.

#### Bug Fixes

- Fixed a bug where an `Index` object created from a `Series`/`DataFrame` incorrectly updates the `Series`/`DataFrame`'s index name after an inplace update has been applied to the original `Series`/`DataFrame`.
- Suppressed an unhelpful `SettingWithCopyWarning` that sometimes appeared when printing `Timedelta` columns.


## 1.22.1 (2024-09-11)
This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.


## 1.22.0 (2024-09-10)

### Snowpark Python API Updates

### New Features

- Added following new functions in `snowflake.snowpark.functions`:
- Added the following new functions in `snowflake.snowpark.functions`:
- `array_remove`
- `ln`

Expand Down Expand Up @@ -46,14 +70,14 @@
- Fixed a bug in `session.read.csv` that caused an error when setting `PARSE_HEADER = True` in an externally defined file format.
- Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
- Fixed a bug in `session.get_session_stage` that referenced a non-existing stage after switching database or schema.
- Fixed a bug where calling `DataFrame.to_snowpark_pandas_dataframe` without explicitly initializing the Snowpark pandas plugin caused an error.
- Fixed a bug where calling `DataFrame.to_snowpark_pandas` without explicitly initializing the Snowpark pandas plugin caused an error.
- Fixed a bug where using the `explode` function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the `outer` parameter.

### Snowpark Local Testing Updates

#### New Features

- Added support for type coercion when passing columns as input to udf calls
- Added support for type coercion when passing columns as input to UDF calls.
- Added support for `Index.identical`.

#### Bug Fixes
Expand Down Expand Up @@ -105,6 +129,9 @@
- Added support for creating a `DatetimeIndex` from an `Index` of numeric or string type.
- Added support for string indexing with `Timedelta` objects.
- Added support for `Series.dt.total_seconds` method.
- Added support for `DataFrame.apply(axis=0)`.
- Added support for `Series.dt.tz_convert` and `Series.dt.tz_localize`.
- Added support for `DatetimeIndex.tz_convert` and `DatetimeIndex.tz_localize`.

#### Improvements

Expand All @@ -113,9 +140,11 @@
- Improved `pd.to_datetime` to handle all local input cases.
- Create a lazy index from another lazy index without pulling data to client.
- Raised `NotImplementedError` for Index bitwise operators.
- Display a clearer error message when `Index.names` is set to a non-like-like object.
- Display a more clear error message when `Index.names` is set to a non-like-like object.
- Raise a warning whenever MultiIndex values are pulled in locally.
- Improve warning message for `pd.read_snowflake` include the creation reason when temp table creation is triggered.
- Improve performance for `DataFrame.set_index`, or setting `DataFrame.index` or `Series.index` by avoiding checks require eager evaluation. As a consequence, when the new index that does not match the current `Series`/`DataFrame` object length, a `ValueError` is no longer raised. Instead, when the `Series`/`DataFrame` object is longer than the provided index, the `Series`/`DataFrame`'s new index is filled with `NaN` values for the "extra" elements. Otherwise, the extra values in the provided index are ignored.
- Properly raise `NotImplementedError` when ambiguous/nonexistent are non-string in `ceil`/`floor`/`round`.

#### Bug Fixes

Expand All @@ -126,10 +155,6 @@
- Fixed a bug where `Series.reindex` and `DataFrame.reindex` did not update the result index's name correctly.
- Fixed a bug where `Series.take` did not error when `axis=1` was specified.

#### Behavior Change

- When calling `DataFrame.set_index`, or setting `DataFrame.index` or `Series.index`, with a new index that does not match the current length of the `Series`/`DataFrame` object, a `ValueError` is no longer raised. When the `Series`/`DataFrame` object is longer than the new index, the `Series`/`DataFrame`'s new index is filled with `NaN` values for the "extra" elements. When the `Series`/`DataFrame` object is shorter than the new index, the extra values in the new index are ignored—`Series` and `DataFrame` stay the same length `n`, and use only the first `n` values of the new index.


## 1.21.1 (2024-09-05)

Expand Down
2 changes: 2 additions & 0 deletions docs/source/modin/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,8 @@ Series
Series.dt.seconds
Series.dt.microseconds
Series.dt.nanoseconds
Series.dt.tz_convert
Series.dt.tz_localize


.. rubric:: String accessor methods
Expand Down
5 changes: 2 additions & 3 deletions docs/source/modin/supported/dataframe_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``any`` | P | | ``N`` for non-integer/boolean types |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``apply`` | P | | ``N`` if ``axis == 0`` or ``func`` is not callable |
| ``apply`` | P | | ``N`` if ``func`` is not callable |
| | | | or ``result_type`` is given or ``args`` and |
| | | | ``kwargs`` contain DataFrame or Series |
| | | | ``N`` if ``func`` maps to different column labels. |
Expand Down Expand Up @@ -471,8 +471,7 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``to_xml`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``transform`` | P | | Only callable and string parameters are supported.|
| | | | list and dict parameters are not supported. |
| ``transform`` | P | | ``Y`` if ``func`` is callable. |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``transpose`` | P | | See ``T`` |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
Expand Down
4 changes: 2 additions & 2 deletions docs/source/modin/supported/datetime_index_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,9 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``snap`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``tz_convert`` | N | | |
| ``tz_convert`` | Y | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``tz_localize`` | N | | |
| ``tz_localize`` | P | ``ambiguous``, ``nonexistent`` | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``round`` | P | ``ambiguous``, ``nonexistent`` | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
Expand Down
3 changes: 1 addition & 2 deletions docs/source/modin/supported/general_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,7 @@ Data manipulations
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``merge`` | P | ``validate`` | ``N`` if param ``validate`` is given |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``merge_asof`` | P | ``by``, ``left_by``, ``right_by``| ``N`` if param ``direction`` is ``nearest``. |
| | | , ``left_index``, ``right_index``| |
| ``merge_asof`` | P | ``left_index``, ``right_index``, | ``N`` if param ``direction`` is ``nearest``. |
| | | , ``suffixes``, ``tolerance`` | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``merge_ordered`` | N | | |
Expand Down
5 changes: 3 additions & 2 deletions docs/source/modin/supported/series_dt_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,10 @@ the method in the left column.
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``to_pydatetime`` | N | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``tz_localize`` | N | |
| ``tz_localize`` | P | ``N`` if `ambiguous` or `nonexistent` are set to a |
| | | non-default value. |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``tz_convert`` | N | |
| ``tz_convert`` | Y | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``normalize`` | Y | |
+-----------------------------+---------------------------------+----------------------------------------------------+
Expand Down
2 changes: 1 addition & 1 deletion docs/source/modin/supported/timedelta_index_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
| ``ceil`` | Y | | |
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
| ``mean`` | N | | |
| ``mean`` | Y | | |
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
| ``total_seconds`` | Y | | |
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
2 changes: 1 addition & 1 deletion recipe/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% set name = "snowflake-snowpark-python" %}
{% set version = "1.21.1" %}
{% set version = "1.22.1" %}

package:
name: {{ name|lower }}
Expand Down
5 changes: 1 addition & 4 deletions src/snowflake/snowpark/_internal/analyzer/analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -956,10 +956,7 @@ def do_resolve_with_resolved_children(
schema_query = schema_query_for_values_statement(logical_plan.output)

if logical_plan.data:
if (
len(logical_plan.output) * len(logical_plan.data)
< ARRAY_BIND_THRESHOLD
):
if not logical_plan.is_large_local_data:
return self.plan_builder.query(
values_statement(logical_plan.output, logical_plan.data),
logical_plan,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
# Copyright (c) 2012-2024 Snowflake Computing Inc. All rights reserved.
#

from typing import AbstractSet, Optional
from typing import AbstractSet, List, Optional

from snowflake.snowpark._internal.analyzer.expression import (
Expression,
derive_dependent_columns,
derive_dependent_columns_with_duplication,
)
from snowflake.snowpark._internal.analyzer.query_plan_analysis_utils import (
PlanNodeCategory,
Expand All @@ -29,6 +30,9 @@ def __str__(self):
def dependent_column_names(self) -> Optional[AbstractSet[str]]:
return derive_dependent_columns(self.left, self.right)

def dependent_column_names_with_duplication(self) -> List[str]:
return derive_dependent_columns_with_duplication(self.left, self.right)

@property
def plan_node_category(self) -> PlanNodeCategory:
return PlanNodeCategory.LOW_IMPACT
Expand Down
Loading

0 comments on commit a803ca7

Please sign in to comment.