Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNOW-1826257]: Refactor docs to provide one place for supported aggregation functions #2680

Merged
merged 8 commits into from
Dec 16, 2024
62 changes: 62 additions & 0 deletions docs/source/modin/supported/agg_supp.rst
sfc-gh-rdurrani marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
:orphan:

Supported Aggregation Functions
====================================

This page lists which aggregation functions are supported by ``DataFrame.agg``,
``Series.agg``, ``DataFrameGroupBy.agg``, and ``SeriesGroupBy.agg``.
The following table is structured as follows: The first column contains the aggregation function's name.
The second column is a flag for whether or not the aggregation is supported by ``DataFrame.agg``. The
third column is a flag for whether or not the aggregation is supported by ``Series.agg``. The fourth column
is whether or not the aggregation is supported by ``DataFrameGroupBy.agg``. The fifth column is whether or not
the aggregation is supported by ``SeriesGroupBy.agg``.

.. note::
``Y`` stands for yes (supports distributed implementation), ``N`` stands for no (API simply errors out),
and ``P`` stands for partial (meaning some parameters may not be supported yet).

Both Python builtin and NumPy functions are supported for ``DataFrameGroupBy.agg`` and ``SeriesGroupBy.agg``.

+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| Aggregation Function | ``DataFrame.agg`` supports? (Y/N/P) | ``Series.agg`` supports? (Y/N/P) | ``DataFrameGroupBy.agg`` supports? (Y/N/P) | ``SeriesGroupBy.agg`` supports? (Y/N/P) |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``count`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
| | For ``axis=1``, ``Y`` if index is | | | |
| | not a MultiIndex. | | | |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``mean`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
| | ``N`` for ``axis=1``. | | | |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``min`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
| | For ``axis=1``, ``Y`` if index is | | | |
| | not a MultiIndex. | | | |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``max`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
| | For ``axis=1``, ``Y`` if index is | | | |
| | not a MultiIndex. | | | |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``sum`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
| | For ``axis=1``, ``Y`` if index is | | | |
| | not a MultiIndex. | | | |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``median`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
| | ``N`` for ``axis=1``. | | | |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``size`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
| | ``N`` for ``axis=1``. | | | |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``std`` | ``P`` for ``axis=0`` - only when | ``P`` - only when ``ddof=0`` | ``P`` - only when ``ddof=0`` | ``P`` - only when ``ddof=0`` |
| | ``ddof=0`` or ``ddof=1``. | or ``ddof=1``. | or ``ddof=1``. | or ``ddof=1``. |
| | ``N`` for ``axis=1``. | | | |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``var`` | ``P`` for ``axis=0`` - only when | ``P`` - only when ``ddof=0`` | ``P`` - only when ``ddof=0`` | ``P`` - only when ``ddof=0`` |
| | ``ddof=0`` or ``ddof=1``. | or ``ddof=1``. | or ``ddof=1``. | or ``ddof=1``. |
| | ``N`` for ``axis=1``. | | | |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``quantile`` | ``P`` for ``axis=0`` - only when | ``P`` - only when ``q`` is the | ``P`` - only when ``q`` is the | ``P`` - only when ``q`` is the |
| | ``q`` is the default value or | default value or a scalar. | default value or a scalar. | default value or a scalar. |
| | a scalar. | | | |
| | ``N`` for ``axis=1``. | | | |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
| ``len`` | ``N`` | ``N`` | ``Y`` | ``Y`` |
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
12 changes: 3 additions & 9 deletions docs/source/modin/supported/dataframe_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,15 +65,9 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``add_suffix`` | Y | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``agg`` | P | ``margins``, ``observed``, | If ``axis == 0``: ``Y`` when function is one of |
| | | ``sort`` | ``count``, ``mean``, ``min``, ``max``, ``sum``, |
| | | | ``median``, ``size``; ``std`` and ``var`` |
| | | | supported with ``ddof=0`` or ``ddof=1``; |
| | | | ``quantile`` is supported when ``q`` is the |
| | | | default value or a scalar. |
| | | | If ``axis == 1``: ``Y`` when function is |
| | | | ``count``, ``min``, ``max``, or ``sum`` and the |
| | | | index is not a MultiIndex. |
| ``agg`` | P | ``margins``, ``observed``, | Check |
| | | ``sort`` | `Supported Aggregation Functions <agg_supp.html>`_ |
| | | | for a list of supported functions. |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``aggregate`` | P | ``margins``, ``observed``, | See ``agg`` |
| | | ``sort`` | |
Expand Down
7 changes: 3 additions & 4 deletions docs/source/modin/supported/groupby_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,9 @@ Function application
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| GroupBy method | Snowpark implemented? (Y/N/P/D) | Missing parameters | Notes for current implementation |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``agg`` | P | ``axis`` other than 0 is not | ``Y``, support functions are count, mean, min, max,|
| | | implemented. | sum, median, std, size, len, and var |
| | | | (including both Python and NumPy functions) |
| | | | otherwise ``N``. |
| ``agg`` | P | ``axis`` other than 0 is not | Check |
| | | implemented. | `Supported Aggregation Functions <agg_supp.html>`_ |
| | | | for a list of supported functions. |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``aggregate`` | P | ``axis`` other than 0 is not | See ``agg`` |
| | | implemented. | |
Expand Down
9 changes: 3 additions & 6 deletions docs/source/modin/supported/series_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,9 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``add_suffix`` | Y | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``agg`` | P | | ``Y`` when function is one of ``count``, |
| | | | ``mean``, ``min``, ``max``, ``sum``, ``median``, |
| | | | ``size``; ``std`` and ``var`` supported with |
| | | | ``ddof=0`` or ``ddof=1``; ``quantile`` is |
| | | | supported when ``q`` is the default value |
| | | | or a scalar. |
| ``agg`` | P | | Check |
| | | | `Supported Aggregation Functions <agg_supp.html>`_ |
| | | | for a list of supported functions. |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``aggregate`` | P | | See ``agg`` |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
Expand Down
48 changes: 48 additions & 0 deletions tests/integ/modin/groupby/test_groupby_basic_agg.py
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,54 @@ def test_groupby_agg_with_float_dtypes_named_agg() -> None:
)


@pytest.mark.parametrize(
"grpby_fn",
[
lambda gr: gr.quantile(),
lambda gr: gr.quantile(q=0.3),
],
)
@sql_count_checker(query_count=1)
def test_groupby_agg_quantile_with_int_dtypes(grpby_fn) -> None:
sfc-gh-rdurrani marked this conversation as resolved.
Show resolved Hide resolved
native_df = native_pd.DataFrame(
{
"col1_grp": ["g1", "g2", "g0", "g0", "g2", "g3", "g0", "g2", "g3"],
"col2_int64": np.arange(9, dtype="int64") // 3,
"col3_int_identical": [2] * 9,
"col4_int32": np.arange(9, dtype="int32") // 4,
"col5_int16": np.arange(9, dtype="int16") // 3,
"col6_mixed": np.concatenate(
[
np.arange(3, dtype="int64") // 3,
np.arange(3, dtype="int32") // 3,
np.arange(3, dtype="int16") // 3,
]
),
"col7_int_missing": [5, 6, np.nan, 2, 1, np.nan, 5, np.nan, np.nan],
"col8_mixed_missing": np.concatenate(
[
np.arange(2, dtype="int64") // 3,
[np.nan],
np.arange(2, dtype="int32") // 3,
[np.nan],
np.arange(2, dtype="int16") // 3,
[np.nan],
]
),
}
)
snowpark_pandas_df = pd.DataFrame(native_df)
by = "col1_grp"
snowpark_pandas_groupby = snowpark_pandas_df.groupby(by=by)
pandas_groupby = native_df.groupby(by=by)
eval_snowpark_pandas_result(
snowpark_pandas_groupby,
pandas_groupby,
grpby_fn,
comparator=assert_snowpark_pandas_equals_to_pandas_with_coerce_to_float64,
)


@sql_count_checker(query_count=2)
def test_groupby_agg_with_int_dtypes(int_to_decimal_float_agg_method) -> None:
snowpark_pandas_df = pd.DataFrame(
Expand Down
Loading