-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multistage][feature] FunctionRegistry unification #12302
[multistage][feature] FunctionRegistry unification #12302
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #12302 +/- ##
============================================
- Coverage 61.62% 61.57% -0.05%
+ Complexity 1152 207 -945
============================================
Files 2421 2427 +6
Lines 131809 132173 +364
Branches 20343 20404 +61
============================================
+ Hits 81227 81387 +160
- Misses 44628 44795 +167
- Partials 5954 5991 +37
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
e0aaa7e
to
847ccf4
Compare
List<PinotScalarFunction> candidates = getFunctionMap().range(functionName, CASE_SENSITIVITY).stream().filter( | ||
e -> e.getValue() instanceof PinotScalarFunction && (e.getValue().getParameters().size() == numParams | ||
|| ((PinotScalarFunction) e.getValue()).isVarArgs())).map(e -> (PinotScalarFunction) e.getValue()) | ||
.collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be easier to read with a fluent notation like:
List<PinotScalarFunction> candidates = getFunctionMap()
.range(functionName, CASE_SENSITIVITY)
.stream()
.filter(e -> e.getValue() instanceof PinotScalarFunction)
.map(e -> (PinotScalarFunction) e.getValue())
.filter(fun.getParameters().size() == numParams || fun.isVarArgs())
.collect(Collectors.toList());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
return candidates.get(0).getFunctionInfo(); | ||
} else { | ||
// TODO: convert this to throw IAE when all operator has scalar equivalent. | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO going to be resolved after this PR is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but it will take quite a while. once we have every scalar function matches the transform we will throw here (e.g. we found a discrepancy between what transform can do but scalar can't)
@@ -94,83 +154,195 @@ private FunctionRegistry() { | |||
public static void init() { | |||
} | |||
|
|||
@VisibleForTesting | |||
public static void registerFunction(Method method) { | |||
registerFunction(method, Collections.singleton(method.getName()), false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this function will be called? Specifically, this method seems to be not thread safe. If some thread is registering a function and another is looking for functions, the behavior is not specified. Is that something that can happen? It looks like the method is public for testing proposes, so I would assume the production code won't call this method but:
- We should specify that in the Javadoc of this method
- Does it mean we cannot run tests concurrently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only for testing. normal registration occur during static block of this class
474a450
to
b381516
Compare
1. FunctionRegistry keeps the old FUNCTION_INFO_MAP only 2. moved Calcite Catalog-based schema.Function registry to its own package; along with a SqlOperator based PinotOperatorTable 3. both CatalogReader and OperatorTable utilizes ground truth function from PinotFunctionRegistry --> will be default once deprecate FunctionRegistry 4. PinotFunctionRegistry provides argument-type based lookup via the same method SqlValidator utilize to lookup routine (and lookup operator overload) 5. clean up multi-stage engine side accordingly
- use signature type lookup for v2 engine - deprecate usage of FunctionRegistry - allow nullable return from function lookup b/c some operators doesn't have scalar equivalent at the moment
- merge PinotFunctionRegistry with FunctionRegistry - renamed to match calcite.schema and calcite.sql from pinot.common.function package
- allow complex transform and other dynamic operand/return inference - example added for array value constructor
- support arrayElementAt - support arrayValuConstructor
b381516
to
ffb940b
Compare
// Walk through all the Pinot aggregation types and | ||
// 1. register those that are supported in multistage in addition to calcite standard opt table. | ||
// 2. register special handling that differs from calcite standard. | ||
for (AggregationFunctionType aggregationFunctionType : AggregationFunctionType.values()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess is not the target of this PR but... how difficult would it be to also have operations that are read from the classpath like we do with functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be an overengineering IMO --> since we dont have UDAF.
if any changes we do here creating the operator wrapped in class with annotations we might not have all the primitives ready for UDAF in the future. thus i haven't make the changes;
but regardless we can do that in separate PR
} | ||
|
||
/** | ||
* Registers a method with the given function name. | ||
* Returns the {@link FunctionInfo} associated with the given function name and number of parameters, or {@code null} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Javadoc here is not correct, right? Isn't #getFunctionInfo(String, int)
(declared above) the one that Returns the FunctionInfo associated with the given function name and number of parameters
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. this was a copied javadoc. should've changed. but didn't when i merged it with walterddr#92
SqlOperandTypeChecker operandTypeChecker = | ||
(SqlOperandTypeChecker) clazz.getField("OPERAND_TYPE_CHECKER").get(null); | ||
for (Method method : clazz.getMethods()) { | ||
if (method.getName().equals("eval")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small improvement: Could we annotate the methods instead of forcing to use a special method name? I guess the difference is not huge and it may be a personal preference, but it can be easier to look for implementations by looking for usages of an annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with special fields like RETURN_TYPE_INFERENCE
or OPERAND_TYPE_CHECKER
. Why not just force these classes to implement a method that returns these values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both are great suggestions/improvements. there's no reason, i simply feel:
- there's no need to annotate additionally to all callable methods if we already have the class annotated (might be verbose with lots of polymorphism)
- for the special fields yeah we can use interface method stub. it is just didn't occur to me that we need different ones within the same annotated function class
both can and should be done if there's more benefit that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I guess it won't be in the short term, but I think we strongly need to move from a system where FunctionInvoker
does _method.invoke(_instance, args)
to a system where the equivalent to FunctionInvoker
always call PinotFunction.eval(Object[])
.
PinotFunction
would be defined as:
public interface PinotFunction extends Function {
SqlOperandTypeChecker getOperandTypeChecker();
SqlReturnTypeInference getReturnTypeInference();
Object eval(Object[] args);
}
public abstract class SumPinotFunction implements PinotFunction {
public SqlReturnTypeInference getReturnTypeInference() {
return ReturnTypes.NULLABLE_SUM;
}
public Object eval(Object[] args) {
if (args[0] == null) return args[1];
if (args[1] == null) return args[0];
return evalNotNull(args);
}
public abstract evalNotNull(Object[] args);
@ScalarFunction
public class SumIntPinotFunctionSupplier extends SumPinotFunction {
public SqlOperandTypeChecker getOperandTypeChecker() {
return // something that defines int args and arity 2
}
public Object evalNotNull(Object[] args) {
int i1 = args[0];
int i2 = args[1];
return i1 + i1
}
}
// Same with the other options
}
this is #12229 improvement to address the recent modification to #12278
Design Doc
https://docs.google.com/document/d/1VcH8k6vU3dTdb717qJo1bgZAAIkeTso6KwKXkzpbAjk/edit
Framework Changes Details
Feature Changes Details
arrayElementAt
methodarrayElementAtString/Int/Long/Float
originally on v1arrayValueConstructor
method (originally Support Array constructor using literal evaluation #12278)Reference
Next Step