Skip to content

Commit

Permalink
Update sort_first_koalas.py
Browse files Browse the repository at this point in the history
  • Loading branch information
svittoz authored Apr 18, 2024
1 parent 2bb5071 commit 58b01a1
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions eds_scikit/utils/sort_first_koalas.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,26 @@


def sort_values_first_koalas(
dataframe: DataFrame, by_cols: List[str], cols: List[str], ascending: bool = True
dataframe: DataFrame, by_cols: List[str], cols: List[str], disambiguate_col: str, ascending: bool = True
) -> DataFrame:
"""Use this function to obtain in koalas the same ouput as dataframe.sort_values(cols).groupby(by_cols).first() in pandas.
"""Use this function to obtain in koalas the same ouput as dataframe.sort_values([*cols, disambiguate_col]).groupby(by_cols).first() in pandas.
If you want the output to be deterministic, provide an id column of your dataframe as the last element of variable cols.
disambiguate_col must be provided to make sure the output is deterministic
Parameters
----------
dataframe : DataFrame
by_cols : List[str]
cols : List[str]
disambiguate_col : List[str]
ascending : bool, optional
Returns
-------
DataFrame
"""
cols = [*cols, disambiguate_col]
for col in cols:
dataframe_min_max = dataframe.groupby(by_cols, as_index=False)[col]
dataframe_min_max = (
Expand Down

0 comments on commit 58b01a1

Please sign in to comment.