Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce OrdinalReturnTypeInferenceV2 to infer RexCall's return type #481

Merged
merged 6 commits into from
Dec 15, 2023

Conversation

aastha25
Copy link
Contributor

@aastha25 aastha25 commented Dec 6, 2023

What changes are proposed in this pull request, and why are they necessary?

Given a SQL definition - SELECT udf(StructCol) AS modCol FROM foo

Prior to this PR, Coral-Schema generates a nullable schema for modCol with top level and inner fields set to nullable even when StructCol is non-nullable. Since udf's return type nullability inference is unavailable to RelToAvroSchemaConverter, this shuttle assumes the schema after applying th UDF is nullable.

This PR introduces a new SqlReturnTypeInference - OrdinalReturnTypeInferenceV2which has been leveraged in RelToAvroSchemaConverter to derive udf - cast_nullability's return type schema (and its nullability info) more accurately. This inference strategy has been applied when the SqlCall's return type is same as one of the operands type AND that operand refers to a field from the input schema.

How was this patch tested?

./gradlew build
added new UTs.

@aastha25 aastha25 changed the title [WIP] [DO NOT REVIEW] OrdinalReturnTypeInferenceV2 to infer RexCall return type Introduce OrdinalReturnTypeInferenceV2 to infer RexCall's return type Dec 13, 2023
Copy link
Collaborator

@ljfgem ljfgem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aastha25 for this PR!
Could you kick off a regression test for it for safety?
And have we tested on the target view? Could you add the test results in the testing done?

@aastha25
Copy link
Contributor Author

@ljfgem

Thanks @aastha25 for this PR! Could you kick off a regression test for it for safety?
This is a new code path and only targets the UDF schema generated for the newly defined SqlReturnTypeInference.

And have we tested on the target view? Could you add the test results in the testing done?
The target view is unavailable. Hence the UDF has been tested within the realms of Coral.

@ljfgem
Copy link
Collaborator

ljfgem commented Dec 15, 2023

@aastha25 Thanks for confirmation.
Could we collaborate with the user to produce a test view and ensure this PR works for it on Spark shell?

@aastha25
Copy link
Contributor Author

@aastha25 Thanks for confirmation. Could we collaborate with the user to produce a test view and ensure this PR works for it on Spark shell?

unfortunately the view will be created & tested by the user next week, after Coral & Spark enable support for it. In case of any issues, we'll need to work on a potential follow up PR.

Copy link
Collaborator

@ljfgem ljfgem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aastha25 aastha25 merged commit 08e43df into linkedin:master Dec 15, 2023
1 check passed
return rexCall;
}
}

RelDataType fieldType = rexCall.getType();
boolean isNullable = SchemaUtilities.isFieldNullable(rexCall, inputSchema);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the reason OrdinalReturnTypeInference does not work out of the box?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[OrdinalReturnTypeInference](https://github.com/linkedin/linkedin-calcite/blob/li-1.21.0/core/src/main/java/org/apache/calcite/sql/type/OrdinalReturnTypeInference.java#L25) cannot be directly used because the private field 'ordinal' has no public accessor method. This class supports type derivation via method RelDataType inferReturnType(SqlOperatorBinding opBinding). To leverage this API, we need to make coral-schema complaint with Coral IR and enable type derivation in coral-schema, similar to our ongoing work in coral-spark & coral-trino

Copy link
Contributor

@wmoustafa wmoustafa Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So once we append a RexCall field, what gets used at the end to infer its type if not inferReturnType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we append the field AFTER extracting the ordinal here and deriving its field type here

Since this is a rexShuttle, the field types from the inputSchema are available in the RelNode representation.

SchemaUtilities.isFieldNullable(rexCall, inputSchema) derives incorrect nullability (always nullable) for the top level fields and SchemaUtilities.makeNullable here derives incorrect nullability (always nullable) for the inner fields.

By introducing the if condition above in line 445, we avoid those codepaths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants