Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'group()' built-in function to DaphneDSL. #921

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

saminbassiri
Copy link
Contributor

@saminbassiri saminbassiri commented Nov 20, 2024

Add group() Built-in Function to DaphneDSL for Grouping and Aggregation

Description

This pull request introduces a new group() built-in function to DaphneDSL, enabling the creation of a GroupOp in DaphneIR and closes issue #903 .


Changes Implemented

  1. group() Built-in Function in DaphneDSL:

    • Interface: group(arg:frame, groupCols:str, ..., sumCol:str)
    • Accepts:
      • A frame as input.
      • An arbitrary number of columns to group on.
      • A single column to compute the sum.
    • Aggregation Support:
      • Only supports SUM as the aggregation function.
  2. Kernel Function Updates:

    • Updated order and extractCol kernel functions to process string values correctly.
    • Extended group kernel function to handle string values.
  3. Test Cases:

    • Added script-level tests to validate the functionality of the group() function in DaphneDSL.

 - This built-in function creates a GroupOp in DaphneIR.
 - Only support 'SUM' as an aggregation function.
 - Get only one aggregation column.
 - Get an arbitrary number of columns to group on.
- Add support for string values in the 'group' kernel function.
  - SUM, MIN, and MAX are the only aggregation functions applied to string columns.
  - Other aggregation functions throw an exception if they receive strings as arguments or results.
  - Additionally, 'DeduceValueTypeAndExecute' cannot handle string values due to unsupported operations on strings.
  - Therefore, 'ColumnGroupAggStringVTArg' that is specialized for strings is used.
  - Or 'ColumnGroupAgg' is called exclusively with string values.
- The 'group' function internally calls the 'order' and 'extractCol' kernel functions.
  - These two functions are updated to handle string values correctly.
- Added script-level test cases to validate the new functionality.
- Close issue daphne-eu#903
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant