-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Coral-Schema] Add DATE type support in RelDataTypeToAvroType #484
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR!
schema.addProp("logicalType", "timestamp-millis"); | ||
return schema; | ||
case DATE: | ||
schema = Schema.create(Schema.Type.INT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there's no Date
type schema in avro https://github.com/apache/avro/blob/branch-1.10/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L125, how did we choose 'int' here?
What is the default spark derived schema for 'Date' type field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/spark/blob/master/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala#L145
Spark regards this avro schema as date
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so spark derived date
type is cast-able to coral derived 'int' type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Spark AvroDeserializer would return date
type directly for this schema, and spark.table.printSchema
also returns date
type, Spark won't cast to int type.
"type" : [ "null", { | ||
"type" : "int", | ||
"logicalType" : "date" | ||
} ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the hive schema representation of 'Date' data type column? who injects the 'logicalType'? why is this not just
{
"name" : "Date_Field",
"type" : [ "null", "date" ]
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Avro schema representing date
type, "type" : [ "null", "date" ]
is not valid in Avro.
coral-schema/src/main/java/com/linkedin/coral/schema/avro/RelDataTypeToAvroType.java
Show resolved
Hide resolved
have we validated that this approach works with and without viewshift for the impacted dataset? |
@aastha25 Tested it on the target viewshift replacement view, which couldn't be translated & queried without this PR, but can be translated & queried with this PR. |
It'll be good to test the top level view (which invokes this nested viewshift replacement view), to make sure that the schema generated for viewshift view (after comparing spark inferred schema & coral provided schema) is compatible with the top level view. |
// Spark recognizes the data type of {"type": "int", "logicalType": "date"} as a date type: | ||
// https://github.com/apache/spark/blob/master/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala#L145 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the Spark comment relevant?
Especially, the code below converts a Calcite Date Type to Avro Date logical type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our use case, Avro schema needs to be deserializer by Spark for upcast, but agreed that the Spark-related comment should not be added here since Coral-Schema is not just used for Spark.
What changes are proposed in this pull request, and why are they necessary?
This patch added the
DATE
type support inRelDataTypeToAvroType
to prevent the exceptionjava.lang.UnsupportedOperationException: DATE is not supported.
How was this patch tested?