Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage][feature] FunctionRegistry unification #12302

Conversation

walterddr
Copy link
Contributor

@walterddr walterddr commented Jan 22, 2024

this is #12229 improvement to address the recent modification to #12278

Design Doc

https://docs.google.com/document/d/1VcH8k6vU3dTdb717qJo1bgZAAIkeTso6KwKXkzpbAjk/edit

Design Diagram

Framework Changes Details

  • Refactored FunctionRegistry
  • Separate the original SqlStdOperatorTable and PinotCatalog into 3 component
    • SqlStdOperatorTable (with overrides against the standard)
    • PinotOperatorTable (with all Transform/AggregateFunctionTypes registered)
    • PinotCatalog (with all ScalarFunctionTypes registered
  • alter the ChainOperatorTable rule to lookup from SqlStd first, then PinotOperatorTable then PinotCatalog

Feature Changes Details

  • added arrayElementAt method
    • as a demonstration to ArgType-based lookup with function overloading based on input types
    • also as a "backward-compatibility" show case to still support arrayElementAtString/Int/Long/Float originally on v1
  • added arrayValueConstructor method (originally Support Array constructor using literal evaluation #12278)
    • deprecated VAR_ARG_INDEX --> with Calcite function map we no longer need this
    • support for dynamic type-check/return-inference SqlUserDefinedFunction

Reference

Next Step

@codecov-commenter
Copy link

codecov-commenter commented Jan 22, 2024

Codecov Report

Attention: 95 lines in your changes are missing coverage. Please review.

Comparison is base (7978d29) 61.62% compared to head (ffb940b) 61.57%.

Files Patch % Lines
...e/pinot/common/function/scalar/ArrayFunctions.java 27.27% 36 Missing and 4 partials ⚠️
...apache/pinot/common/function/FunctionRegistry.java 78.33% 16 Missing and 10 partials ⚠️
...common/function/sql/PinotCalciteCatalogReader.java 31.25% 8 Missing and 3 partials ⚠️
.../pinot/common/function/sql/PinotOperatorTable.java 56.00% 9 Missing and 2 partials ⚠️
...ot/common/function/sql/PinotSqlScalarFunction.java 0.00% 4 Missing ⚠️
...ot/common/function/schema/PinotScalarFunction.java 84.21% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12302      +/-   ##
============================================
- Coverage     61.62%   61.57%   -0.05%     
+ Complexity     1152      207     -945     
============================================
  Files          2421     2427       +6     
  Lines        131809   132173     +364     
  Branches      20343    20404      +61     
============================================
+ Hits          81227    81387     +160     
- Misses        44628    44795     +167     
- Partials       5954     5991      +37     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.57% <62.89%> (+<0.01%) ⬆️
java-21 <0.01% <0.00%> (-61.51%) ⬇️
skip-bytebuffers-false 61.57% <62.89%> (-0.03%) ⬇️
skip-bytebuffers-true 0.00% <0.00%> (-61.49%) ⬇️
temurin 61.57% <62.89%> (-0.05%) ⬇️
unittests 61.57% <62.89%> (-0.05%) ⬇️
unittests1 46.77% <62.89%> (+<0.01%) ⬆️
unittests2 27.62% <0.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@walterddr walterddr force-pushed the pr_function_reg_consolidation_with_array branch from e0aaa7e to 847ccf4 Compare January 22, 2024 20:31
@walterddr walterddr marked this pull request as ready for review January 22, 2024 21:53
@walterddr walterddr added feature multi-stage Related to the multi-stage query engine labels Jan 22, 2024
@walterddr walterddr changed the title [multistage][feature] FunctionRegistry unification attempt2 [multistage][feature] FunctionRegistry unification Jan 22, 2024
List<PinotScalarFunction> candidates = getFunctionMap().range(functionName, CASE_SENSITIVITY).stream().filter(
e -> e.getValue() instanceof PinotScalarFunction && (e.getValue().getParameters().size() == numParams
|| ((PinotScalarFunction) e.getValue()).isVarArgs())).map(e -> (PinotScalarFunction) e.getValue())
.collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be easier to read with a fluent notation like:

    List<PinotScalarFunction> candidates = getFunctionMap()
       .range(functionName, CASE_SENSITIVITY)
       .stream()
       .filter(e -> e.getValue() instanceof PinotScalarFunction)
       .map(e -> (PinotScalarFunction) e.getValue())
       .filter(fun.getParameters().size() == numParams || fun.isVarArgs())
       .collect(Collectors.toList());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return candidates.get(0).getFunctionInfo();
} else {
// TODO: convert this to throw IAE when all operator has scalar equivalent.
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO going to be resolved after this PR is merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but it will take quite a while. once we have every scalar function matches the transform we will throw here (e.g. we found a discrepancy between what transform can do but scalar can't)

@@ -94,83 +154,195 @@ private FunctionRegistry() {
public static void init() {
}

@VisibleForTesting
public static void registerFunction(Method method) {
registerFunction(method, Collections.singleton(method.getName()), false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this function will be called? Specifically, this method seems to be not thread safe. If some thread is registering a function and another is looking for functions, the behavior is not specified. Is that something that can happen? It looks like the method is public for testing proposes, so I would assume the production code won't call this method but:

  • We should specify that in the Javadoc of this method
  • Does it mean we cannot run tests concurrently?

Copy link
Contributor Author

@walterddr walterddr Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only for testing. normal registration occur during static block of this class

@walterddr walterddr force-pushed the pr_function_reg_consolidation_with_array branch 2 times, most recently from 474a450 to b381516 Compare January 24, 2024 19:50
Rong Rong added 9 commits January 26, 2024 08:24
1. FunctionRegistry keeps the old FUNCTION_INFO_MAP only
2. moved Calcite Catalog-based schema.Function registry to its own package; along with a SqlOperator based PinotOperatorTable
3. both CatalogReader and OperatorTable utilizes ground truth function from PinotFunctionRegistry --> will be default once deprecate FunctionRegistry
4. PinotFunctionRegistry provides argument-type based lookup via the same method SqlValidator utilize to lookup routine (and lookup operator overload)
5. clean up multi-stage engine side accordingly
- use signature type lookup for v2 engine
- deprecate usage of FunctionRegistry
- allow nullable return from function lookup b/c some operators doesn't
  have scalar equivalent at the moment
- merge PinotFunctionRegistry with FunctionRegistry
- renamed to match calcite.schema and calcite.sql from
  pinot.common.function package
- allow complex transform and other dynamic operand/return inference
- example added for array value constructor
- support arrayElementAt
- support arrayValuConstructor
@walterddr walterddr force-pushed the pr_function_reg_consolidation_with_array branch from b381516 to ffb940b Compare January 26, 2024 16:24
// Walk through all the Pinot aggregation types and
// 1. register those that are supported in multistage in addition to calcite standard opt table.
// 2. register special handling that differs from calcite standard.
for (AggregationFunctionType aggregationFunctionType : AggregationFunctionType.values()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess is not the target of this PR but... how difficult would it be to also have operations that are read from the classpath like we do with functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be an overengineering IMO --> since we dont have UDAF.

if any changes we do here creating the operator wrapped in class with annotations we might not have all the primitives ready for UDAF in the future. thus i haven't make the changes;

but regardless we can do that in separate PR

}

/**
* Registers a method with the given function name.
* Returns the {@link FunctionInfo} associated with the given function name and number of parameters, or {@code null}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc here is not correct, right? Isn't #getFunctionInfo(String, int) (declared above) the one that Returns the FunctionInfo associated with the given function name and number of parameters ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. this was a copied javadoc. should've changed. but didn't when i merged it with walterddr#92

SqlOperandTypeChecker operandTypeChecker =
(SqlOperandTypeChecker) clazz.getField("OPERAND_TYPE_CHECKER").get(null);
for (Method method : clazz.getMethods()) {
if (method.getName().equals("eval")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small improvement: Could we annotate the methods instead of forcing to use a special method name? I guess the difference is not huge and it may be a personal preference, but it can be easier to look for implementations by looking for usages of an annotation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with special fields like RETURN_TYPE_INFERENCE or OPERAND_TYPE_CHECKER. Why not just force these classes to implement a method that returns these values?

Copy link
Contributor Author

@walterddr walterddr Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are great suggestions/improvements. there's no reason, i simply feel:

  1. there's no need to annotate additionally to all callable methods if we already have the class annotated (might be verbose with lots of polymorphism)
  2. for the special fields yeah we can use interface method stub. it is just didn't occur to me that we need different ones within the same annotated function class

both can and should be done if there's more benefit that way

Copy link
Contributor

@gortiz gortiz Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I guess it won't be in the short term, but I think we strongly need to move from a system where FunctionInvoker does _method.invoke(_instance, args) to a system where the equivalent to FunctionInvoker always call PinotFunction.eval(Object[]).

PinotFunction would be defined as:

public interface PinotFunction extends Function {
  SqlOperandTypeChecker getOperandTypeChecker();
  SqlReturnTypeInference getReturnTypeInference();
  Object eval(Object[] args);
}
public abstract class SumPinotFunction implements PinotFunction {
    public SqlReturnTypeInference getReturnTypeInference() {
       return ReturnTypes.NULLABLE_SUM;
    }

   public Object eval(Object[] args) {
      if (args[0] == null) return args[1];
      if (args[1] == null) return args[0];
      return evalNotNull(args);
   }

   public abstract evalNotNull(Object[] args);

  @ScalarFunction
  public class SumIntPinotFunctionSupplier extends SumPinotFunction {
    public SqlOperandTypeChecker getOperandTypeChecker() {
       return // something that defines int args and arity 2
    }

    public Object evalNotNull(Object[] args) {
       int i1 = args[0];
       int i2 = args[1];

       return i1 + i1
    }
  }
  // Same with the other options
}

@walterddr walterddr closed this May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants