Skip to content

Commit

Permalink
Merge branch 'main' into aalam-SNOW-1546396-add-large-query-breakdown…
Browse files Browse the repository at this point in the history
…-optimization
  • Loading branch information
sfc-gh-aalam committed Aug 28, 2024
2 parents 7073a83 + 572d01d commit 783f973
Show file tree
Hide file tree
Showing 114 changed files with 5,006 additions and 1,552 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/precommit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ jobs:
matrix:
os: [macos-latest, windows-latest-64-cores, ubuntu-latest-64-cores]
python-version: ["3.9", "3.10", "3.11"]
cloud-provider: [aws, azure, gcp]
cloud-provider: [aws, gcp] # TODO: SNOW-1643374 add azure back
exclude:
# only run macos with aws py3.9 for doctest
- os: macos-latest
Expand Down Expand Up @@ -309,7 +309,7 @@ jobs:
matrix:
os: [macos-latest, windows-latest-64-cores, ubuntu-latest-64-cores]
python-version: [ "3.9", "3.10", "3.11" ]
cloud-provider: [aws, azure, gcp]
cloud-provider: [aws, gcp] # TODO: SNOW-1643374 add azure back
exclude:
# only run macos with aws py3.9 for doctest
- os: macos-latest
Expand Down
25 changes: 21 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,15 @@

### Snowpark Python API Updates

### New Features

- Added following new functions in `snowflake.snowpark.functions`:
- `array_remove`
- `ln`

#### Improvements

- Added support for `ln` in `snowflake.snowpark.functions`
- Added support for specifying the following to `DataFrameWriter.save_as_table`:
- `enable_schema_evolution`
- `data_retention_time`
Expand All @@ -26,6 +33,7 @@
- Fixed a bug in `session.read.csv` that caused an error when setting `PARSE_HEADER = True` in an externally defined file format.
- Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
- Fixed a bug in `session.get_session_stage` that referenced a non-existing stage after switching database or schema.
- Fixed a bug where calling `DataFrame.to_snowpark_pandas_dataframe` without explicitly initializing the Snowpark pandas plugin caused an error.

### Snowpark Local Testing Updates

Expand All @@ -43,12 +51,15 @@

#### New Features

- Added limited support for the `Timedelta` type, including
- supporting tracking the Timedelta type through `copy`, `cache_result`, `shift`, `sort_index`.
- converting non-timedelta to timedelta via `astype`.
- Added limited support for the `Timedelta` type, including the following features. Snowpark pandas will raise `NotImplementedError` for unsupported `Timedelta` use cases.
- supporting tracking the Timedelta type through `copy`, `cache_result`, `shift`, `sort_index`, `assign`, `bfill`, `ffill`, `fillna`, `compare`, `diff`, `drop`, `dropna`, `duplicated`, `empty`, `equals`, `insert`, `isin`, `isna`, `items`, `iterrows`, `join`, `len`, `mask`, `melt`, `merge`, `nlargest`, `nsmallest`.
- converting non-timedelta to timedelta via `astype`.
- `NotImplementedError` will be raised for the rest of methods that do not support `Timedelta`.
- support for subtracting two timestamps to get a Timedelta.
- support indexing with Timedelta data columns.
- support indexing with Timedelta data columns.
- support for adding or subtracting timestamps and `Timedelta`.
- support for binary arithmetic between two `Timedelta` values.
- support for lazy `TimedeltaIndex`.
- Added support for index's arithmetic and comparison operators.
- Added support for `Series.dt.round`.
- Added documentation pages for `DatetimeIndex`.
Expand All @@ -60,6 +71,12 @@
- Added support for `pd.merge_asof`.
- Added support for `Series.dt.normalize` and `DatetimeIndex.normalize`.
- Added support for `Index.is_boolean`, `Index.is_integer`, `Index.is_floating`, `Index.is_numeric`, and `Index.is_object`.
- Added support for `DatetimeIndex.round`, `DatetimeIndex.floor` and `DatetimeIndex.ceil`.
- Added support for `Series.dt.days_in_month` and `Series.dt.daysinmonth`.

#### Improvements

- Refactored `quoted_identifier_to_snowflake_type` to avoid making metadata queries if the types have been cached locally.

#### Bug Fixes

Expand Down
40 changes: 40 additions & 0 deletions docs/source/modin/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,3 +220,43 @@ DatetimeIndex

DatetimeIndex.mean
DatetimeIndex.std

.. _api.timedeltaindex:

TimedeltaIndex
--------------

.. autosummary::
:toctree: pandas_api/

TimedeltaIndex

.. rubric:: `TimedeltaIndex` Components

.. autosummary::
:toctree: pandas_api/

TimedeltaIndex.days
TimedeltaIndex.seconds
TimedeltaIndex.microseconds
TimedeltaIndex.nanoseconds
TimedeltaIndex.components
TimedeltaIndex.inferred_freq

.. rubric:: `TimedeltaIndex` Conversion

.. autosummary::
:toctree: pandas_api/

TimedeltaIndex.as_unit
TimedeltaIndex.to_pytimedelta
TimedeltaIndex.round
TimedeltaIndex.floor
TimedeltaIndex.ceil

.. rubric:: `TimedeltaIndex` Methods

.. autosummary::
:toctree: pandas_api/

TimedeltaIndex.mean
2 changes: 2 additions & 0 deletions docs/source/modin/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,8 @@ Series
Series.dt.weekday
Series.dt.dayofyear
Series.dt.day_of_year
Series.dt.days_in_month
Series.dt.daysinmonth
Series.dt.quarter
Series.dt.isocalendar
Series.dt.month_name
Expand Down
4 changes: 2 additions & 2 deletions docs/source/modin/supported/dataframe_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,8 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``assign`` | Y | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``astype`` | P | | ``N``: from string to datetime or ``errors == |
| | | | "ignore"`` |
| ``astype`` | P | | ``N`` if from string to datetime/timedelta or |
| | | | ``errors == "ignore"`` |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``at_time`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
Expand Down
6 changes: 3 additions & 3 deletions docs/source/modin/supported/datetime_index_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,11 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``tz_localize`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``round`` | N | | |
| ``round`` | P | ``ambiguous``, ``nonexistent`` | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``floor`` | N | | |
| ``floor`` | P | ``ambiguous``, ``nonexistent`` | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``ceil`` | N | | |
| ``ceil`` | P | ``ambiguous``, ``nonexistent`` | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``to_period`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
Expand Down
1 change: 1 addition & 0 deletions docs/source/modin/supported/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ To view the docs for the most recent release, check that you’re viewing the st
dataframe_supported
index_supported
datetime_index_supported
timedelta_index_supported
window_supported
groupby_supported
resampling_supported
Expand Down
4 changes: 2 additions & 2 deletions docs/source/modin/supported/series_dt_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,9 @@ the method in the left column.
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``is_leap_year`` | Y | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``daysinmonth`` | N | |
| ``daysinmonth`` | Y | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``days_in_month`` | N | |
| ``days_in_month`` | Y | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``tz`` | N | |
+-----------------------------+---------------------------------+----------------------------------------------------+
Expand Down
4 changes: 2 additions & 2 deletions docs/source/modin/supported/series_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,8 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``asof`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``astype`` | P | | ``N``: from string to datetime or ``errors == |
| | | | "ignore"`` |
| ``astype`` | P | | ``N`` if from string to datetime/timedelta or |
| | | | ``errors == "ignore"`` |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``at_time`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
Expand Down
48 changes: 48 additions & 0 deletions docs/source/modin/supported/timedelta_index_supported.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
``pd.TimedeltaIndex`` supported APIs
====================================

The following table is structured as follows: The first column contains the method name.
The second column is a flag for whether or not there is an implementation in Snowpark for
the method in the left column.

.. note::
``Y`` stands for yes, i.e., supports distributed implementation, ``N`` stands for no and API simply errors out,
``P`` stands for partial (meaning some parameters may not be supported yet), and ``D`` stands for defaults to single
node pandas execution via UDF/Sproc.

Attributes

+-----------------------------+---------------------------------+----------------------------------------------------+
| TimedeltaIndex attribute | Snowpark implemented? (Y/N/P/D) | Notes for current implementation |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``days`` | N | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``seconds`` | N | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``microseconds`` | N | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``nanoseconds`` | N | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``components`` | N | |
+-----------------------------+---------------------------------+----------------------------------------------------+
| ``inferred_freq`` | N | |
+-----------------------------+---------------------------------+----------------------------------------------------+


Methods

+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
| DataFrame method | Snowpark implemented? (Y/N/P/D) | Missing parameters | Notes for current implementation |
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
| ``as_unit`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
| ``to_pytimedelta`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
| ``round`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
| ``floor`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
| ``ceil`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
| ``mean`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+-------------------------------------------+
2 changes: 2 additions & 0 deletions docs/source/snowpark/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Functions
array_min
array_position
array_prepend
array_remove
array_size
array_slice
array_sort
Expand Down Expand Up @@ -195,6 +196,7 @@ Functions
length
listagg
lit
ln
locate
log
lower
Expand Down
3 changes: 3 additions & 0 deletions src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -1608,6 +1608,9 @@ def with_query_block(

new_query = project_statement([], name)

# note we do not propagate the query parameter of the child here,
# the query parameter will be propagate along with the definition during
# query generation stage.
queries = child.queries[:-1] + [Query(sql=new_query)]
# propagate the cte table
referenced_ctes = {name}.union(child.referenced_ctes)
Expand Down
4 changes: 2 additions & 2 deletions src/snowflake/snowpark/_internal/compiler/query_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def __init__(
# NOTE: the dict used here is an ordered dict, all with query block definition is recorded in the
# order of when the with query block is visited. The order is important to make sure the dependency
# between the CTE definition is satisfied.
self.resolved_with_query_block: Dict[str, str] = {}
self.resolved_with_query_block: Dict[str, Query] = {}

def generate_queries(
self, logical_plans: List[LogicalPlan]
Expand Down Expand Up @@ -217,7 +217,7 @@ def do_resolve_with_resolved_children(
if logical_plan.name not in self.resolved_with_query_block:
self.resolved_with_query_block[
logical_plan.name
] = resolved_child.queries[-1].sql
] = resolved_child.queries[-1]

resolved_plan = self.plan_builder.with_query_block(
logical_plan.name,
Expand Down
Loading

0 comments on commit 783f973

Please sign in to comment.